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ABSTRACT 

We present EXOFAST, a fast, robust suite of routines written in IDL which is designed to fit exoplanetary 
transits and radial velocity variations simultaneously or separately, and characterize the parameter uncertainties 
and covariances with a Differential Evolution Markov Chain Monte Carlo method. We describe how our code 
incorporates both data sets to simultaneously derive stellar parameters along with the transit and RV parameters, 
resulting in more self-consistent results on an example fit of the discovery data of HAT-P-3b that is well-mixed 
in under five minutes on a standard desktop computer We describe in detail how our code works and outline 
ways in which the code can be extended to include additional effects or generalized for the characterization of 
other data sets - including non-planetary data sets. We discuss the pros and cons of several common ways to 
parameterize eccentricity, highlight a subtle mistake in the implementation of MCMC that could bias the in- 
ferred eccentricity of intrinsically circular orbits to significantly non-zero results, discuss a problem with IDE's 
built-in random number generator in its application to large MCMC fits, and derive a method to analytically fit 
the linear and quadratic limb darkening coefficients of a planetary transit. Finally, we explain how we achieved 
improved accuracy and over a factor of 100 improvement in the execution time of the transit model calculation. 
Our entire source code, along with an easy- to-use online i nterface for several basic features of our transit and 
radial velocity fitting, are available onUne at |http://astroutils.astronomy.ohio-state.edu/exofast| 
Subject headings: IDL, transit, radial velocity, Markov Chain, MCMC, eccentricity bias, limb darkening,Data 
Analysis and Techniques 



1. INTRODUCTION 

In the mere 17 years since the first discovery of a "Hot 
Jupite r" around a main sequence star (Mayor & Ouelo^ 
Il995h . the study of exoplanets has exploded into one of the 
most vibrant and rapidly developing fields in astronomy to- 
day. Over 500 exoplanets have been confirme d, and four 
times that many candidates have been identified ( Bataiha et al.l 
l20T2h . The pace of exoplanet discoveries has consistently 
increased, with new exoplanet search methods continuously 
being developed and implemented, and new and surprising 
classes of planets routinely being uncovered as new regimes 
of parameter space are explored. Meanwhile, an enormous 
amount of effort is being put into developing theories of planet 
formation and evolution that can encompass the astonishing 
diversity of planetary systems that has emerged. 

The transit method has been at the center of this revo- 
lution, not only because it has expanded the region of pa- 
rameter space to which we are sensitive, but more impor- 
tantly, it can provide a seemingly endless wealth of infor- 
mation about each planet (see Winn 2010 for a compre- 
hensive review). For example, precise photometry during 
the primary transit can be used to measure the planet ra- 
dius and orbital inclination, and when combined with the 
minimum mass inferred from radial velocity (RV) studies, 
yield the true planet mass and average density, thereby 
constraining the planet's structure (Guillot 2005; Sat o et aLl 
12005; Charbonneau et al. 2006; Fortney et al. 2006). Pho- 
tometric observations during both primary transits and sec- 
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ondary echpses enable the study of their atm ospheres 
dCharbonnea u et al.ll2002 UVidal-Mad iar et al."200 3i and ther- 
mal emissio n (Deming et al. ||2005t [Charbonneau etaH 120061; 
iDeming etal.ii2006i) . Variations in the timing and shape of 
the eclipses and tra nsits hint at t he existence of other bod- 
ies in the system (Miralda-Escude 2002; Holman & Murray 
20051; lAgol et al. 2005; Steffen & Agol 2005; Ford & Gaudi 
20061 IFord&HolmanI I2007I) . constrain orbital evolution 



due t o tides or o t her e ffects (ISasselovl 120031: iFabryckvl 
120081; iHelher et all 120091). probe "weather" in exoplanet 
atmospheres dRauscher et aLl 12007b . and provide a probe 
of the inte rior structure and oblateness of exoplanets 
dRagozzine & Wolf 2009; Carter & Winn 2010), to name a 
few. In resonant configurations, even Eris-mass planets may 
be detectable via such Transit Timing Variations (TTVs) us- 
ing current ground-based technology for favorable systems 
(Carter et al. 2011a). Further, the projected angle between the 
spin axis of the star and the orbit of the planet can be mea- 
sured via spectroscopic observations during transit to provide 
diagnostic information of the physical processes at work in 
the migration of Hot Jupiters ("Oueloz et al.ll2000l; IWinn et aTl 
2005; Gaudi & Winn 2007; Triaud et all|2010|). Indeed, the 
combination of radial velocity and transit data provides the 
most thorough insights into a planetary system of any demon- 
strated planet detection and characterization method to date. 

Because of the wealth of information that can be derived 
from these planets, it is important to carefully consider the 
best ways to extract such information from the data sets we 
acquire such that results are limited by the data and can be 
compared in a manner that is as homogeneous and consistent 
as possible. 

The Markov Chain Monte Carlo m ethod has become a 
standard tool in exoplanet research (e.g. l Fordll2005t [Gregory! 
2005; Ford 2006; Winn et al. 2010), and has recently be- 
gun to be replaced with a faster, more elegant flavor: Dif- 
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ferential Evolution MCMC, DE-MC (e.g.. Iter BraakI 12001 
iJohnson et alJIMltlCarter etaniMTaiDovle et al.ll20lTlK 

Many public codes exist to model transit light curves 
and/or radi al velocities, inclu ding TAP (Gazaketal. 2012), 
JKTEBOP, fSouthworth 2008), FITSH (Pal 2012), PHEOBE 
dPrsa & Zwitter 2005J, VARTOOLS (Hartm an et al. 2008), 
Nightfalfl PhoS-T dMisljseLaLl |2012|), and Systemic 
jMeschiari et al.ll2009l) . 

The goal of this paper is to present an additional code, EX- 
OFAST, which we believe provides a valuable combination 
of features not present in any currently-public code: simul- 
taneous and self-consistent radial velocity, transit, and stel- 
lar parameter fitting; fast, robust, DE-MC characterization of 
errors; intuitive outputs, careful attention to realistic priors; 
non-interactive (easy to pipeline); well documented; and easy 
to install, use, and customize. Providing a completely general 
code that can fit any conceivable planetary phenomenon with- 
out modification is not practical. Rather than attempt to be 
comprehensive, our goal was to provide a modular, easily- 
extensible framework with a relatively straightforward but 
powerful example implementation for exoplanets that fits a 
single-planet system which has either or both RV and primary 
transit data. This framework and example implementation can 
be adapted to add additional effects as the data are able to con- 
strain them (e.g., TTVs, secondary eclipses), impose different 
priors, or even analyze completely different problems (e.g., 
Supemovae, Cepheids). 

While IDL is a proprietary language that is generally slower 
than low-level languages like C and Fortran, we chose to use 
this language because of the large library of existing code, 
the ease of development, and the fact that well-written IDL is 
comparable in speed to higher level languages for most (i.e., 
non-serial) applications. Of course, an MCMC code is nec- 
essarily serial (i.e., one cannot calculate an arbitrary step in 
the Markov Chain without first calculating the step before it), 
but the vast majority of the time spent is the model evaluation 
at each step, which has been carefully vectorized whenever 
possible. For those unable or unwilling to purchase an IDL 
license, the GNU Data Language (GDLj^ is an open-source 
compiler that claims full syntax compatibility with code up to 
IDL version 7.1. Our code does not work out of the box with 
GDL, but some users have gotten the core features working. 
Future updates will keep compatibility with GDL in mind. 

In ^ we provide a brief summary of the general problem of 
fitting data sets, an overview of how MCMC works, and why 
it is preferred over alternative methods of fitting data and es- 
timating uncertainties. The discussion here and routines cited 
are completely general to the problem of fitting any model to 
a data set and properly characterizing the uncertainties - it is 
not just applicable to exoplanets or even just astronomy. 

Next, we describe our specific procedure to fit RV data (§|3), 
including a detail ed di scussion of different ways to parameter- 
ize eccentricity ( ^3.4l i. and two potentially-common mistakes 
when using MCMC, both of which can inflate the measured 
eccentricity of intrinsically circular orbits significantly. We 
discuss our procedure to fit transit data in §3] and combined 
RV and transit data sets simultaneously in ^ Section|6]walks 
through an example fit of HAT-P-3b with real, public data to 
explain how the code works, what it does, and what its out- 
puts are. Our online interfaces to the most useful codes are 
presented in ^ Along with these online interfaces, all of 

^ http://www.hs.uni-hamburg.de/DE/Ins/PerAVichmann/Nightfall.html 
* http://gnudatalanguage.sourceforge.net/ 



the source code described here is available onlineQ. Those al- 
ready familiar with MCMC and the basics of light curve and 
RV modeling may find it most efficient to begin with the dis- 
cussion on eccentricity parameterization (§ 13.4) . then skip to 
^ where we discuss our unique approach to fitting RV and 
Transit data simultaneously. 

Appendix |A] demonstrates that the linear and/or quadratic 
limb darkening coefficients can be fit analytically, which can 
reduce the dimensionality of a non-linear solver, thereby dras- 
tically increasing its speed. Specific improvements to the 
Mandel & Agol (2002) code to calculate the quadratically 
limb-darkened flux during transit, including a factor of ^100 
improvement in speed that cuts the run time of typical RV 
and transit fit from an hour to a few minutes, are described 
in appendix IB] Appendix ICl discusses a problem with IDE's 
built-in random number generator and provides an alternative 
at a moderate increase in the overall run time. Appendix iDl 
discusses a way to interpret a negative eccentricity that pro- 
vides continuous models across the boundary at e=0. Lastly, 
appendix |E] discusses the execution time and identifies areas 
for future improvement. 

2. OVERVIEW OF THE PROBLEM 

2. 1 . Finding the best fit 

Given a data set, D, with uncertainties, a, we would like 
to generate a model, M, from a set of model parameters that 
describe the data. If we assume the uncertainties are Gaussian, 
then the probability of the data D given the model M, or the 
likelihood, .if, is given by 



P{D\M) = (27r) 



-«/2 



exp 



(1) 

where the subscript / corresponds to each of the n data points 
and s is an added scatter term. If we assume s is constant for 
each model, it can be absorbed by the error in each data point, 
the UkeUhood simplifies to 
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Therefore, assuming fixed uncertainties, finding the max- 
imum likelihood is equivalent to finding the model with the 
lowest x^- When the model is linear (i.e., can be written as a 
simple linear combination of known quantities with unknown 
coefficients), the can be minimized analytically and ex- 
actly to find ea ch of the co efficients (i.e., the parameters of 
the model) (see lGouldll2003l) . 

However, when the model is non-linear, such as for transits 
and radial velocities, we must determine the best fit param- 
eters which minimize numerically. Unfortunately, there 
are no generic algorithms to minimize the for a global 
parameter space - often, various tricks are required that are 
specific to the particular problem at hand. We will discuss 
the tricks specific to exoplanets in ^ and ^ that we use to 
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restrict the region of parameter space close to the global min- 
imum. Once we identify this region, there are many routines 
that can robustly find a local minimum. AMOEBA is a pop- 
ular non-linear solver that uses the downhil l simplex method 
to find local minima (Nelder & Mead" 1965). Given a starting 
point and stepping scale (which is approximately the range of 
parameter space it will consider), AMOEBA will crawl through 
parameter space to find the minimum, using the each 
step to determine its next step. This routine is very robust at 
finding local minima. 

IDL comes with its own built-in AMOEBA routine, but we 
discovered a bug that truncates the stepping scale to float- 
ing point precision, regardless of the data type initially given. 
This is detrimental when fitting parameters that require dou- 
ble precision (e.g., Julian Day), since the model will simply 
oscillate about the minimum and not converge. We provide 
a debugged version of this code, which now forces all step- 
ping scales to be double precision, as a new code in this suite, 
EXOFAST_AMOEBA. 

Another popular algorithm to find the loc al minimum 
is Le venberg-Marquardt (LM) (Levenberg 1 9441 iMarquardtl 
[T963h . which uses numerical derivatives to predict the mini- 
mum more precisely after each evaluation, and therefore re- 
quires fewer evaluations of the statistic (which is generally 
completely dominates the computation time). The downside 
is that, if the surface is not smooth, the numerical deriva- 
tives may be a poor predictor of the minimum and the fit will 
not be as robust. Markwardt (2009) published an extremely 
versatile and widely-used IDL implementation of the LM al- 
gorithm, called MP FIT. As expected, we found our debugged 
version of AMOEBA to be more robust than MP FIT, routinely 
finding as good and occasionally better values of x'^ but it 
also required about 10 times more model evaluations and was 
therefore about 10 times slower 

Once we have the best fit, we check the quality of the fit 
by examining the probability that our model has the that 
it does. If a model is a good description of the data, and the 
measurement uncertainties are uncorrected and properly es- 
timated, then the probability of getting the we do, P (x^), 
should be 0.5. In the the limit of infinite degrees of freedom, 
this is equivalent to saying the x} pei" degree of freedom, x3 
is unity. Even with as few as 2 degrees of freedom, the dif- 
ference between P (x^) = 0.5 and xt=\vs, only 20%, which 
is already better than we expect this method to be. While we 
do not wish to imply more accuracy than that by being too 
precise, we scale the errors such that P (x^) = 0.5 because it 
is precisely correct under the most naive assumption that our 
uncertainties are Gaussian and uncorrelated. 

Thus, if P (x^) deviates from 0.5 by an amount that is sig- 
nificantly larger than expected given the number of degrees of 
freedom, this implies that either the model does not properly 
describe the data (e.g., there is signal of another planet that 
has not been modeled), that the uncertainties have not been 
properly estimated (e.g., because of unrecognized systemat- 
ics), or that the data are not Gaussian distributed (e.g., there 
are large, non-Gaussian outliers). Unfortunately, it is often 
difficult to distinguish between these possibilities a priori. 

If one has reason to believe that the model properly de- 
scribes the data, but nevertheless finds that the P (x^) differs 
significantly from 0.5, the natural conclusion is that the uncer- 
tainties have been misestimated for the bulk of the data (often 
underestimated) or that there are a few non-Gaussian outliers. 
Parameters or parameter uncertainties derived from such data 



are likely to be biased. In this case, there is a strong motiva- 
tion to attempt to "correct" the data such that P (x^) ^ 0.5, 
thereby producing (hopefully) less biased parameters and pa- 
rameter uncertainties. 

A plausible procedure for "correcting" the uncertainties is 
as follows. First, one can search for and identify strong out- 
liers. If, after eliminating these outliers, one still does not 
find a satisfactory fit, the next step is to modify the uncertain- 
ties. There are generally three ways in which one can modify 
the uncertainties to P (x^) = 0.5: scale the uncertainties by a 
constant multiplicative factor, add a constant term in quadra- 
ture to the uncertainties, or both. The appropriate method to 
adopt depends in detail on why the uncertainties were mises- 
timated. However, if this was known, then it is likely that it 
would have been corrected in the first place. As a general rule, 
if there is an unaccounted systematic that is independent of 
the signal (e.g., stellar jitter), then the additional uncertainty 
term should be added in quadrature. If there is an error in the 
normalization or some calibration systematic (e.g., an error in 
the gain), it is more appropriate to multiply the uncertainties 
by a constant factor. The practical difference between these 
two approaches is usually minimal, but adding uncertainties 
in quadrature will tend to even out all of the uncertainties, 
whe reas multiplying w ill preserve the relative uncertainties. 

In lLee et all (l201lh . we used our code to fit radial veloc- 
ity data for a brown dwarf candidate from the MARVELS 
collaboration. We found for this candidate that the x3 for 
the native data and uncertainties was considerably larger than 
unity. We then tested many different permutations of elimi- 
nating outliers, scaling uncertainties, and adding uncertainty 
terms in quadrature, to force xi, = \, and assessed the effect 
of these different procedures on the resulting best-fit param- 
eters and uncertainties. We found no statistically significant 
difference between the various methods of altering the un- 
certainties. Partly motivated by this experience, our default 
procedure is to scale the uncertainties by a constant factor 
and we do not include a jitter term, as is common with RV 
fits. However, we recognize that this result is unlikely to be 
generic, and note that it is relatively straightforward to modify 
this procedure in our routines. 

Several alternatives to our method of error scaling have 
been suggested. It has been proposed to allow the uncertainty 
scaling (e.g. Gregory 2005 ) or another uncertainty term to add 
in quadrature (e.g. Ford 2006) to vary as a free parameter 
(e.g., the s term in equationfTli. More recently. ICarter & Winnl 
(2009) suggests a wavelet analysis method to fit the corre- 
lated noise component more robustly, and states that treating 
correlated noise as white noise like we do systematically un- 
derestimates the errors. Implementing a wavelet analysis is 
on our long list of eventual improvements, but in practice, the 
difference is r elatively small as long as the red noise is not 
dominant (see lCarter & Winiil (120091) for a discussion). 

If we consider multiple independent data sets from different 
sources, they are likely to have different systematics. There- 
fore, it is a good idea to fit each data set and scale the uncer- 
tainties independently, and then ensure that the resulting pa- 
rameters are consistent with one another before attempting to 
combine them. If there are large inconsistencies between the 
data sets, it is indicative of serious problems with either the 
model (neglected effects) or data (systematic uncertainties) 
and a simultaneous fit of all data sets should not be trusted. 
If they are consistent, we can find the best fit to the combined 
data set using a local minimum solver starting at the best-fit 
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values of one of the independent data sets. 

2.2. Finding the uncertainties: MCMC 

So far, we have discussed the process of evaluating the 
probability that a given data set D is described by a given 
model M, P{D\M). However, what we are actually interested 
in is the probability that a given model M is correct, given 
our data D, P{M\D). This probability depends not only on 
P{D\M), but also on our prior beliefs about the probability of 
a given model M, P{M), which are related by Bayes' theorem 

Here, P{D) is the probability of the data, which is given by the 
integration of P{D\M)P(M) over the parameter space encom- 
passed by the model M. Heuristically, Bayes' theorem can be 
thought of as providing a rigorous way of incorporating new 
data to revise initial beliefs. 

In principle, one is interested in evaluating P{M\D) for two 
purposes. First, a comparison of P{M\D) between two differ- 
ent models for the same data set can determine which model 
is more likely to be correct given the data, models, and pri- 
ors. Second, for a given model, P{M\D) can be used to de- 
termine the relative probability of different parameters of the 
model, also called the posterior probability density. In our 
case, where we are only considering different parameters of 
the same model, P{D) is constant, so we need not explicitly 
calculate it. 

Since there is zero phase space precisely at the best fit, 
it tells us nothing of the uncertainties of the model parame- 
ters. However, by evaluating the posterior probability density, 
P{M\D), we then determine the uncertainties of the model pa- 
rameters, e.g., by determining the range of a given parameter 
that encompasses some set fraction of the probability density. 
Once the priors, P{M), are specified, the task then becomes 
evaluating P{M\D). The Markov Chain Monte Carlo tech- 
nique provides an efficient method for doing this that allows 
for easy determination of median values, uncertainties, and 
covariances for the fitted model parameters, in addition to any 
parameters that can be derived from the model parameters. 
The MCMC technique is also attractive because it is based on 
the data as given, as opposed to bootstrap analyses which use 
simulated data to evaluate the uncertainties. 

We adopt the Metropolis-Hastings algorithm to sample 
P{M\D). We start with a set of trial parameters, and evalu- 
ate for this trial set. We then randomly choose a different 
set of parameters, and calculate for this set of parameters. 
The ratio of the likelihood, assuming the errors are constant 
(i.e., equation 121) for the new set of parameters relative to the 
initial set is given by 

^2/=^i=exp[x'(^''-x'(^^'l/2. 

We then draw a random number uniformly distributed be- 
tween and 1 . If the random number is greater than the like- 
lihood ratio, the model is rejected and we do not step there. 
Instead, we duplicate a copy of the previous position in the 
chain as the current step. If the random number is less than 
the likelihood ratio, we accept the new model. Note, when 
Ax^ < (i.e., the new model is a better fit), ^2/^1 is always 
greater than 1, and so the step will always be accepted. 

We repeat this process, stepping to a new region of param- 
eter space until we have a smooth distribution of values for 



each parameter The resultant density of steps is proportional 
to the posterior probability of each parameter, naturally re- 
sulting in a robust estimate of the median value and the 68% 
confidence interval. Again, we do not consider the absolute 
normalization, P{D), which one would need to do in order 
to consider the relative likelihood of models with a different 
parameterization. 

One of the most problematic aspects of using MCMC is 
determining an appropriate stepping scale and direction. Us- 
ing the proper length scale is key to speedy convergence. If 
the scale is too large, very few trial links will be chosen, and 
many models will be calculated unnecessarily. If the scale is 
too small, many links will be accepted but the adjacent links 
in the chain will be highly correlated with each other, and 
the resultant chain will not be "well-mixed," discussed below. 
Similarly, if the chains step in a correlated parameter set (a 
non-orthogonal direction), the chains are very likely to be re- 
jected because the effective step in the orthogonal space will 
be too large. 

The ide al step mimics the po steriori probability distribution 
precisely (iGelman et al.ll2003l) . Of course, this is not known a 
priori; it is exactly what we are trying to calculate. An elegant 
solution to this problem is t he Diff erential Evolution Markov 
Chain method of ter Braa^ (i2006l) . which runs many chains 
in parallel (equal to twice the number of free parameters), 
and uses the difference between the parameter values between 
two random chains to determine the next step. Since the en- 
semble of chains should be distributed according to the pos- 
terior probability, the difference between two random chains 
gives us the rough scale and direction of the step (i.e., the 
covariance matrix among all parameters), automatically tak- 
ing into account the correlations between parameters within 
each step and dramatically decreasing th e number of lin ks re- 
quired for the chains to be well-mixed, ter Braak' (2006 ) also 
adds a small, uniform deviate to each step to ensure the whole 
parameter space can be reached - otherwise, the steps could 
be cyclic, depending on the starting positions of each chain, 
leaving islands of unexplored parameter space. However, we 
found that, because the dynamic range of the ideal step sizes 
(i.e., the uncertainties) of different parameters can be arbi- 
trarily large depending on the units of the parameters, adding 
the same uniform deviate to the steps in each parameter is 
not general. That is, the log of the period has typical errors 
of order 10"^, so adding a random deviate of 10"^ completely 
dominates its step size, making the step too large and the chain 
inefficient. However, adding the same random deviate to the 
systemic velocity, which has typical errors of order 10 m/s, is 
completely negligible and does not serve its purpose to ade- 
quately mix the steps. 

Instead, we estimate the stepping scale by starting 
at the best-fit values and then varying each parame- 
ter individually until the Ax^ is one, in our program 
EXOFAST_GETMCMCSCALE. Then, we add a uniform devi- 
ate equal to a small fraction of that step size (we somewhat- 
arbitrarily picked 1/10). Even with highly-correlated param- 
eters, this algorithm yields an acceptance rate of 17% for a 
large number of_parameters - close to the optimal acceptance 
rate of ^ 20% dGelman et al.ll2003l) . When done this way, we 
can step in all parameters simultaneously without having to 
monitor the acceptance rates of each parameter individually 
because their step sizes are self-adjusting. 

In order to determine when our MCMC chains have con- 
verged, we roughly follow the guidelines set forth by iFordI 



EXOFAST 



5 



( 120061) . Each parameter in each of our chains begins at their 
best- fit value plus 5 times their corresponding step size (ap- 
proximately their uncertainty) times a Gaussian-distributed 
random number We take steps as described above in all pa- 
rameters simultaneously, until the chains have converged. We 
consider the chains to be converged when both the number of 
independent draws, T- is greater than 1000 and the Gelman- 
Rubin statistic, is less than 1.01 for all parameters. The 
independent draws and Gelman-Rubin statistic are calculated 
in EXOFAST_GELMANRUBIN and defined by Ford" (2006). 
This test must be passed 6 consecutive times - after passing 
these tests the first time, we take 1 percent more steps and 
check again. If it fails, we restart the convergence test. If it 
passes, we repeat, taking 2, 3, 4, and finally 5 percent more 
steps. When all tests have been passed consecutively for all 
parameters, we consider the chains well-mixed and stop. Fi- 
nally, we find the first point at which all chains have had a 
below the medi an and disca rd everything before that as 
the "burn-in" (e.g.. lTegmark et al.ll2 004: Knutson et al. 200^. 
This eliminates any bias due to the starting conditions. 

Due to the limitations of 32-bit operating systems, a sin- 
gle program (e.g., IDE) cannot allocate more than 2 GB of 
memory (~'260 million double-precision elements). Given 
the number of parameters, links, chains, and redundant copies 
of each that must be stored, it is relatively easy to reach this 
limitation with a moderately-sized chain before it converges. 
With a 64-bit machine (and IDE), it is possible to increase 
the maximum length of each chain dramatically, but manag- 
ing that volume of data is slow. Fortunately, due to the auto- 
correlations within the chains, we can "thin" them with little 
penalty in the accuracy of the results or the efficiency of con- 
vergence, but with a huge benefit to the manageability of the 
data set. Therefore, we include an option to thin the chain 
by a factor of Njhin, which will only keep every Nthin link 
in chain as the Markov chain is calculated. EXOFAST will 
estimate how many steps will be required for convergence af- 
ter 5% of the chain has been calculated. If it is not expected 
to be well-mixed at the conclusion of program, it will output 
a warning with a recommended thinning factor Specifying 
this thinning factor will help ensure that the final chain is con- 
verged, but with the unavoidable consequence of increasing 
the execution time by roughly a factor of Nthin- 

The resultant parameter distributions (i.e., the histogram of 
steps for each parameter) are proportional to the posterior 
probability of each parameter We quote the median of the 
distribution as the final value and 34% confidence interval on 
either side as the uncertainty. If there are large covariances 
or non-Gaussian distributions of parameters, the ensemble of 
median values, while individually most probable, may be a 
poor fit to the data. This also means that the quoted values 
for derived parameters are likely to be mathematically incon- 
sistent (e.g., the median values of e and oj, do not exactly 
imply the median value of ecoscj*), but they are statistically 
self-consistent. The true "best-fit" set of parameters can be 
extracted from the program outputs if desired, but we reiter- 
ate that the best-fit has zero phase space associated with it, so 
this is generally not useful. 

2.3. Priors 

One of the subtleties of a proper MCMC implementation is 
the correct choice of priors. We implicitly impose a prior that 
is uniform in each parameter we step, so we must be careful to 
consider our choice of parameterization carefully such that it 



matches our a priori theoretical expectation. Alternatively, we 
can weight the stepping probability by the Jacobian to trans- 
form into our desired parameterzation (Ford||2006j), or correct 
our posterior distributions afterward by importance sampling, 
as long as the particular prior chosen does not preclude vi- 
able regions of parameter space. Supporters of the MCMC 
method argue that all methods have some implied bias, and 
that MCMC is a good way to make that bias explicit. Ideally, 
our prior would represent the underlying distribution of that 
parameter given the selection effects inherent to the sample. 
In practice, this is a very difficult quantity to determine, and 
often times, the reason we are doing the measurement in the 
first place is to determine such broad statistics. 

Fortunately, when the data are highly constraining, the prior 
has little influence on the measured value. However, when 
the data are not highly-constraining, the prior can have a large 
impact on the measured values. 

2.4. Hybrid fitting: Non-linear and Linear Parameters 

In principle, it is possible to solve exactly for linear pa- 
rameters during each step of the Markov chain, which will 
reduce the dimensionality of Markov chain, making conver- 
gence much faster Unfortunately, we have run our program 
on the same data set, only changing whether or not we fit some 
subset of parameters linearly or non-linearly, and the uncer- 
tainties in the linear parameters were as much as 10 times 
smaller when fitted linearly at each step. This is not too sur- 
prising, since the uncertainties in a Markov chain come from 
the distribution of parameters at each step. If we always find 
the very best set of linear parameters given the particular set 
of non-linear parameters, we should expect the width of that 
distribution to be narrower than we would find if they were 
allowed to step randomly at each step, as they do when they 
are treated as non-linear parameters. 

Even if the fitted parameters are not intrinsic to the problem 
(i.e., their values and uncertainties do not directly affect any 
physical property we care to measure), it may be that their 
covariances with parameters we care about cause us to under- 
estimate their uncertainties. To deal with this, we simply fit 
all parameters non-linearly. It is likely possible to analytically 
compensate for this decreased scatter, and therefore recover 
the speed bene fit of analytically fitting many parameters (e.g. 
I An et al .1120021 Appendix D), but we leave this as a potential 
future improvement. 

We also note that this hybrid fitting is similar to detrend- 
ing data prior to fitting it, as is common in transit fits. In 
fact, detrending prior to fitting is significantly worse than 
this because, rather than underestimating the uncertainties and 
covariances of detrending parameters, it ignores them alto- 
gether. This may lead to, among other things, spurious claims 
of Transit Timing Variations (TTVs), as the most commonly- 
removed trend (a linear trend with time or airmass) is highly 
covariant with the transit time. 

3. RADIAE VEEOCITY 

3.1. Radial Velocity Model 

The gravitational interaction between a host star and orbit- 
ing planet results in a Doppler shift of the observed spectrum 
of the star of the form 

RV(t) = K [cos {9(t) + w,) + e cos ] + 7 + 7(f - fo) (6) 
where K is the radial velocity semi-amphtude, and is equal to 
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K = 



2t:G 



P{M^+MpY 



1/3 



Mp sin i 



(7) 



In equations |6] and |2l 0(t) is the true anomaly as a function 
of time, is the argument of periastron of the star's orbit 
measured from the ascending node to its periastrorQ, e is the 
orbital eccentricity, 7 is the systemic velocity (or often just an 
arbitrary instrumental offset), 7 is a systematic acceleration 
either due to an additional body in the system with a period 
much longer than the span of the observations, or systemat- 
ics in the data, and fo is an arbitrary zero point for the slope, 
which we define to be the mean of the of input times. 

The true anomaly, which is the angle between periastron 
and the planet, measured from the barycenter of the system. 



9{t) = 2arctan 



1+e 



-tan 



(8) 



where E{t) is the eccentric anomaly, given by Kepler's equa- 
tion. 



M(f) = £'(0-e sin (£(0), 



(9) 



as a function of the Mean anomaly, M{t). Unfortunately, this 
is a transcendental equation for which no analytic solution for 
E{t) exists. It must be solved numerically for a given M(t). 
We use EXOFAST_KEPLEREQ, a slightly improved version 
of a code writt en by Marc Buie and Joern Wilmfl which uses 
the method bv lMikkolal (II987I) . and in the case of high eccen- 
tricities, uses a Newton-Raphson method to refine the eccen- 
tric anomaly. Our improvement handles diabolical inputs that 
prevent convergence when angles differ by slightly more or 
less than 2tt. While these cases are relatively rare, this rou- 
tine is called hundreds of millions of times during the MCMC 
chain, and the original version almost always failed without 
our fix. 

The mean anomaly simply describes a uniformly flowing 
time and can be computed from the period of the orbit, P, and 
the time of periastron passage, ?>: 



M(f)=y(f-7». 



(10) 



In many instances, the reverse calculation is also required. 
That is, we would Uke to know the time, f, that the planet 
will be at a given true anomaly. For instance, there are many 
special times in an orbit that one may be interested in know- 
ing, like the time of periastron, Tp, the time of primary transit 
center, T rPT , the time of secondary eclipse center, Ts, the time 
when the star is at its ascending node Ta (when the RV is at a 
maximum), the time when the star is at its descending node. 
To (when the RV is at a minimum), or the time that the L4 
and L5 star-planet Lagrange points pass in front of the center 

* Throughout this paper, when we reference the argument of periastron, 
we refer to the argument of periastron of the star's orbit. Since we measure 
the radial velocity of the star's orbit, this is the often unspoken - but not 
completely universal - standard in the exoplanet literature. The argument of 
periastron for the planet, ujp differs from cj, by tt. As this definition can be 
somewhat counter-intuitive, we keep the subscript "*" to make it exphcit. 

' http://astro.uni-tuebingen.de/software/idl/aitlib/astro/keplereq.html 
The use of this terminology does not mean that the object transits - this 
is simply the predicted transit center if the inclination were favorable. This 
may more appropriately be called the time of (inferior) conj unction. 



of the star, Tia and Ti=,, respectively, which may be a favor- 
able t ime to look for transiting Trojan planets (iFord & Gaudil 
I2OO6I) . Each of these times correspond to a particular value of 
the true anomaly, given here: 

e{Tp) =0, 

e{Tc) =^/2-w„ 
e{Ts) =3^2 -w*, 

0(7i)=-a;*, (11) 
0{Td) =7r-a;*, 
9(Tia) =57r/6-a;*, 
d(TL5) =7r/6-a;*. 
Fortunately, this is much easier. We simply invert equation [8] 



E{t) = 2arctan 



1+e 



-tan 



2 



(12) 



plug E(t) into equation |9] and solve equation [TOl for t. Our 
routine for the reverse correction, EXOFAST_GETPHASE has 
keywords that can calculate the phases of each of the true 
anomalies described in equation [TT] or for an arbitrary true 
anomaly. 

While the is completely degenerate for a planet in a cir- 
cular orbit, when the orbit is fixed to be circular, we follow 
the convention that = 7r/2. This has the virtue that the ex- 
pected time of conjunction (or transit) occurs at "periastron", 
i.e., Tp = Tc. 

3.2. RV Parameterization 

The choice of parameterization is extremely important be- 
cause we implicitly impose priors that are uniform in each 
of these parameters. If these priors are not physically moti- 
vated, they may introduce biases in the values of the param- 
eters we infer from the MCMC. Often, these biases can be 
corrected, but if the particular parameterization a priori ex- 
cludes certain regions of parameter space, it cannot. Second, 
the choice of parameterization is important because highly- 
correlated parameter sets converge very slowly. Fortunately, 
this latter concern is largely addressed by the slightly more 
sophisticated, DE-MC algorithm. 

The parameterization we favor for radial velocities is 
logP,log/r, y^cosw*, Y^sinw,, rc,7, and 7. This particular 
parameterization differs from that suggested in Ford (2006^) 
in a few key ways. The parameterization of e and is dis- 
cussed in detail at the end of this section. We both use logP 
and log K, but the notion of imposing strict bounds outside of 
what we think is possible on each of these parameters to make 
them "normalizable" or "proper" priors (i.e., the integral over 
the allowed states is fi nite) is unnece s sary and somewhat mis- 
leading. On page 62. iGeknan et al.l (1200 3) state that we can 
obtain a proper result from an improper prior as long as the 
posterior distribution (i.e., our parameter distribution) is nor- 
malizable, but he warns that such "distributions must be in- 
terpreted with care - one must always check that the posterior 
distribution has a finite integral and a sensible form." Artifi- 
cially imposing boundaries outside of what can be reasonably 
expected does not free us from this responsibility. In either 
case, if our posterior distribution is unexpected, we must in- 
vestigate why. Further, by not imposing strict bounds, we 
simpHfy the code and ensure that we do not exclude solutions 
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that may actually be allowed. In practice, this should make 
no difference as long as the bounds chosen were sufficiently 
conservative so as not to bias the result. If the data provide 
no constraint, we would find that our posterior distribution 
was unbounded, but this would be obvious because our chains 
w ould not con verge . 

iFordI (120051) suggested parameterizing a time in the orbit 
with a mean anomaly, Mq, at some arbitrary zero-point, Tq. 
However, Mq is trivially related to Tp, 



Tp = To- 



MqP 
2tt ' 



(13) 



So, modulo comparatively minor covariances with P, it is just 
as covariant with e and as Tp is. Imagine, when e is very 
nearly zero, the periastron is poorly defined, and may swing 
wildly from point to point in the orbit at each step in the chain. 
This means that Tp, and therefore Mo, must randomly swing 
wildly in the same direction or the model would be out of 
phase with the data and the step would be rejected. This is the 
e ssence of wh y covariant parameterizations are inefficient. 

iFordI (I2006h suggested many other alternative parameter- 
izations to aide convergence, including UJ^, + Mq for low- 
eccentricity orbits, and + for high eccentricity orbits, 
where 9q is a an arbitrary zero point in the true anomaly. 
These approximate the angular position of the star relative to 
the plane of sky at a given reference time, and thus are an at- 
tempt to compensate for the poorly-constrained aj» (the angle 
between the angle on the sky and periastron). Looking back at 
equation[TTl however, we see that each of those special times 
correspond to a fixed angle along the orbit with respect to the 
plane of the sky with no approximation whatsoever Indeed, 
any true anomaly of the form C-w*, where C is a constant, 
is stationary on the sky. Since all such times are similar, we 
use Tc for its practical uses - Tq is the time we want to look 
for transits (and is the reason we first considered this family 
of parameterizations). 

In addition to the faster convergence time, there are other 
major advantages to this parameterization. Tc is usually 
much better constrained (smaller uncertainties) than 7>, we 
n o longer need to tune the parameterization for each system, 
as lFordl(l2006h recommends, and Tc is a parameter of general 
interest - either when the planet transits, or in the case of RV 
planets, when we may wish to search for transits. 

One complication of using Tc is that it could take on any 
value that differs by integer multiples of the period and it 
would have no effect on the derived model. However, the de- 
rived uncertainty of Tc, and its covariance with P, is strongly 
dependent on the choice of the zero-point. It is usually best 
constrained and least covariant closest to the error-weighted 
mean of the input times, and that is what we use to run the 
Markov Chain. 

An alternative parameterization, which we do not adopt 
but has its appeal, is 7c. i and Tcii the times of the first and 
last transit in the data set, instead of Tc and P. Since we 
almost always know the number of periods, Norbas, in be- 
tween with no ambiguity, P can be derived trivially at each 
step {P = iJc .2 - Tc ,\) / Norbits), and unlike Tc and P, Tc.\ and 
7c,2 are completely uncorrected. 

3.3. Radial Velocity Best Fit 

Finding the global minimum to a radial velocity data 
set can be extremely complicated because of the large vol- 
ume of parameter space with widely-separated local min- 



ima. Fortunately, this process can be greatly simplified if 
the orbit is circular, in which case the ecosw, term drops 
out of equation |6] and 9 simplifies to the the mean anomaly, 
9 = M = 2TT(t-Tp)/P. Then, the RV equation can be re-written 
as 



/27r(f-rp)\ f2Tr(t-Tp)\ 
RV(t)=Acos[ — ^ )+Bsin( — „ )+7+7(f-fo), 

(14) 

where A and B are arbitrary coefficients. In this formulation, 
A, B, 7, and 7 are all linearly related to the RV, and so for 
a given P, the values of all other parameters t hat minimize 
can be found analytically (e.g. lGouldll2003h . Thus, there 
is only one non-linear parameter, P, which can be quickly 
stepped through while the others are solved analytically at 
each step. From the best-fit values of A and B, the parame- 
ters of interest can be determined. 



K=y/A^+B^ 
P 

Tp = — (arctan(A,B) + 7r/2). 
27r 



(15) 



where arctan(A,B) represents arctan(B/Al]3^ and 7r/2 comes 
from the definition that = 7r/2 for circular orbits. This is the 
basis for the Lomb-Scargleperiodogram(Lomb 1976; Scarglg] 
IT981 . 

The optimal period spacing requires that we sample finely 
enough such that the difference in phase between the first and 
last observation changes by < 1/2 radian for each step dP. 
This is because when the phase changes by tt radians, the RV 
will have an opposite sign and give a poor fit to the data. This 
results in the criteria that dP < P^/(4ttT), where T is the du- 
ration of the observations. Typically, a scan through periods 
in this way will reveal several peaks in likelihood, due to the 
planetary orbit and aliases thereof. These provide good start- 
ing points for more rigorous, fully non-linear local min imiza- 
tion ro utines which include eccentricity. Wright & HowardI 
(l2009h suggest a parameterization that reduces the number of 
non-linear parameters of the full, Keplerian orbit from five 
per planet to three per planet, which would help robustly fit 
the full Keplerian solution. However, since we are only con- 
cerned with single-planet systems here, this method is not re- 
quired to quickly and robus tly fi nd the best fit, and our con- 
cern about hybrid fitting in ^2.4l trumps the benefit of this pa- 
rameterizaton during MCMC fits. 

Sometimes, because of aliases, multiple periods will pro- 
vide similarly good fits. In such cases, each period should be 
investigated individually, as AMOEBA is unable to find min- 
ima that are widely separated from the initial values. We do 
not yet employ the more so phisticated technique outlined by 
iDawson & FabryckyI (|20T3) to find the best periods among 
aUases, as this is usually unnecessary. When there is clearly a 
unique period associated with the best fit, AMOEBA will give 
us a robust result. Given adequate phase coverage over one 
or more complete orbits, this method can robustly fit most 
single-planet systems without any special effort. Those with 
very high eccentricities or poor phase coverage can also often 
be fit, but sometimes need slight adjustments to the default 

' ' Mathematically, the arc tangent ranges from —n/2 to 7r/2, but when the 
sign of the numerator and denominator are known independently, the precise 
inverse mapping to the full range -tt to tt can be determined. This notation is 
commonly used in computer programming, and is sometimes called atan2. 
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period range or number of minima t o exp lore. For exam- 
ple, fitting the data from lWinn et"!!] (120091) for HD 80606b 
(e ^ 0.93) required searching the 100 likeliest peaks in the 
Lomb-Scargle Periodogram, whereas the default is 5. 

3.4. Eccentricity 

Different groups have chosen to parameterize the eccen- 
tricity, e, and argument of periastron, in several different 
ways. This is done ostensibly for three different reasons: to 
apply the appropriate prior, make the chains converge faster, 
and ease and simplicity in programming. It is unclear what 
a good, physically-motivated prior for the eccentricity of a 
planet is, particularly given the complications of tidal circu- 
larization. Still, it is commonly assumed that a uniform prior 
in e is best, and we do the same. 

Therefore, we explored the advantages of the most common 
ways to parameterize the eccentricity, specifically, ecosw* 
and esino;,; e and a;, where < e < 1 and -tt < < tt, 
and ^/ecoso;* and y/esinuj^,. We found subtle differences in 
the different parameterizations, which are worth further ex- 
ploration. 

In Appendix|D] we discuss a potentially-useful new param- 
eterization, e and oj* where -1 < e < 1 and is unbounded. 
We explain why the models are continuous across the e = 
boundary, but ultimately, we found it inferior to our preferred 
parameterization and do not endorse it for our particular ap- 
pUcation. 

3.4.1. Eccentricity Prior 

To ensure our prior distribution is what we expect (i.e., uni- 
form), we computed the prior distributions of e for each of the 
above parameterizations by running a standard Markov Chain, 
but setting the to 1 as long as the chain made an allowed 
step. If the chain makes a disallowed step, we set the to 
infinity. Since this results in a uniform likelihood surface in 
the allowed region, the posterior distribution is equal to the 
prior distribution. 

When parameterizing in ecosoj* and esinoj*, we solve for 
e and a;^. at each step , and set the to infinity if e > 1. As 
noted by'FordI (l2006h . and shown in figure[T]in red, we clearly 
see a linear prior in e. Ford said that we must correct for 
the linear prior in e during an MCMC fit by weighting the 
stepping probability by the Jacobian of the transformation be- 
tween the parameters in which we step and the parameters we 
desire to be uniform. In the case of stepping in ecosoj* and 
esinoj*, the Jacobian to transform to e and is e, so we must 
weight the stepping probability by e,_i / e,, where the subscript 
/ denotes the current link in the chain. This will preferentially 
reject steps to higher eccentricity and nearly recovers the uni- 
form prior in e, as shown in green in Figure [T] However, due 
to the singularity at e = 0, there is a very slight overcorrection 
at e = 0. 

An other popular parameterization (e.g., l Anderson et al.l 
is Y^cosw* and y/esinu^,. Since the Jacobian to e and 
is a constant 1/2, it can be ignored when computing the 
stepping probability, and it naturally recovers a uniform prior 
in e and w,, as shown in black in Figure [T] This works by es- 
sentially taking smaller steps in eccentricity near zero which 
exactly compensate for the smaller area at e = in ecoso;,, 
esinoj* space. 

Stepping directly in e and w» obviously results in a uniform 
prior (shown in blue), where we set the x^ to infinity if our 
step took us to any of e < 0, e > 1, < -tt, or > tt. 




e 



Fig. 1. — The prior distributions of e wliile stepping in y^cosw* and 
Y^sintj* (black), ecosw* and «sina;» without connecting for the linear prior 
(red), with correcting for the linear prior (green), and stepping in e and 1.1;, 
directly (blue). We note that the y/ecosuit,, -y^sin and e, parameterza- 
tions correctly reproduce the uniform prior, but the ecoso;,, esintj, without 
correcting for the piior shows a clear linear trend and is obviously wrong. 
Even correcting for the prior, there is a slight overcorrection at e = 0. 



3.4.2. Convergence Time 

The primary motivation often cited for using a different pa- 
rameterization of e is that the convergence tim e is reduced . To 
test this, we simulated 100 data sets similar to iFordI (120061) . In 
particular, we chose 7 = 500 m/s, P = 3.223 days, = 53°, 
and K = 50 m/s, (7 was fixed at zero) with 80 evenly spaced 
data points over a span of 30 periods and Gaussian random 
uncertainty wi th a 1 -sigma width of VS m/s. The major dif- 
ferences from iFordI (2006) are that our times were evenly 
distributed rather than distributed according to the observing 
times of the California and Carnegie planet search program, 
and we input Hot-Jupiter-like parameters for those which 
were not explicitly stated. Neither of these should have an 
appreciable impact on our comparison. 

We fit these 100 simulated data sets as described in ^3.31 
with each of the three eccentricity parameterizations, repeat- 
ing the procedure for each of the 15 eccentricities listed in 
Table [U (for a total of 4500 fits). The uncertainty in the ec- 
centricity, as found by our Markov Chain, was approximately 
0.007 in each case, so our steps in eccentricity roughly cor- 
respond to to 10 sigma significant eccentricity. For a more 
direct comparison with Fq3 (l2006l) . we also include eccen- 
tricities of 0.01, 0.1, 0.5 and 0.8. Each set of 100 simulated 
RV curves had the same Gaussian "random" noise to make 
the comparison more robust. 

We then recorded the number of steps each fit took until 
the chain was well-mixed, according to the criteria outlined 
in ^2.21 Table [T] shows the log of the median number of to- 
tal accepted steps in all chains until convergence for each of 
the four parameterizations of eccentricity, as a function of ec- 
centricity, in addition to the be st value from all parameteriza- 
tions proposed by FordI (120061) . where applicable. Since the 
execution time is proportional to the number of steps in the 
chain (plus small overheads), this is a convenient, computer- 
independent way of determining how efficient the Markov 
Chain is. For reference, for the e = case parameterized as 
Y^coso;*, y/esinuj^,, the fit took about 15 seconds on a stan- 
dard desktop computer purchased in 2009. 

For moderate eccentricities, ecosw*, esino;, was slightly 
faster than y^coso;*, v^sinw*. Both were nearly twice as 



EXOFAST 



9 



TABLE 1 

The log of the number of steps in all chains 
before convergence for the 3 different 
ecc entr icity parameterizations discussed 
in j33.5l for many values of eccentricity, 

ALONG WITH THE BEST VALUES FROM'FOr1| 
iiOOSh WHERE APPLICABLE. THE 
e COS CJ, PARAMETERIZATION HAS BEEN CORRECTED 
FOR THE LINEAR PRIOR. 











Ford f2006) 


0.000 


4.62 


4.62 


4.n 




0.007 


4.58 


4.62 


4.76 




0.010 


4.57 


4.62 


4.76 


5.2 


0.014 


4.57 


4.64 


4.76 




0.021 


4.53 


4.62 


4.76 




0.028 


4.53 


4.57 


4.62 




0.035 


4.53 


4.57 


4.57 




0.042 


4.53 


4.54 


4.57 




0.049 


4.53 


4.53 


4.57 




0.056 


4.53 


4.53 


4.57 




0.063 


4.53 


4.57 


4.53 




0.070 


4.53 


4.53 


4.57 




0.100 


4.53 


4.53 


4.53 


3.9 


0.500 


4.53 


4.53 


4.53 


4.2 


0.800 
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fast as e, uj^ at low eccentricities. Even compared to iFordI 
(|2006), which uses a significantly more complicated (system- 
dependent) parameterization, our lowest eccentricity case was 
about four times faster, and our highest eccentricity cas e was 
about 50% faster However, at moderate eccentricities, iFordI 
(|2006) was 2-4 times faster Given the consistency of the 
number of steps we took as a function of eccentricity and the 
relatively larger variability of Ford's, some of both our ob- 
served benefit and deficiency may be due to random variabil- 
ity of Ford's chains. To give the reader an idea in the variabil- 
ity of our chains, our best chains (as opposed to the median 
values quoted in the table) at e = 0.1 and e = 0.5 took log 
4.31 and 4.35 steps, respectively, while our worst chains for 
e = 0.01 and e = 0.8 took log 4.89 and 4.82 steps, respectively. 

Despite requiring more chains (twice the number of fitted 
parameters, or 12 in this case), a significant fraction of the 
advantage we observe relative to Ford (2006) (who used 10 
chains) is due to the DE-MC algorithm. However, using a 
standard implementation of the Markov Chain, we were still 
faster than Ford in the low eccentricity cases due to our use 
of 7c rather than the various combinations of 7p, Mq, and uj^, 
(see ^O l. 

The final consideration when picking a parameterization 
are the practical advantages of implementing each. We did 
not imple ment the sy stem-dependent parameterizations sug- 
gested by iFordI (120061) . From a practical sense, this is clearly 
the most difficult, though it may be worth it in the moderate- 
eccentricity cases, given the results in Table [T] If desired, this 
can be done with relatively minor changes to the code. 

Stepping directly in e and is intuitive, but dealing with 
a periodic angular parameter introduces a number of special 
cases. In particular, the periodic boundary of can confuse 
the AMOEBA algorithm and it may not find the best-fit value. 
During the MCMC fit, we must be careful to take the modulo 
whenever it crosses a periodic boundary (i.e., ±7r) - other- 
wise, it would be free to jump between equivalent, widely- 
separated minima. However, the DE-MC algorithm would fail 
if the preferred value were near the boundary so that it could 
draw steps from both sides to determine the step size. Further, 
we may get unlucky and find that our probability distribution 



function lies on a boundary or, when it is poorly constrained, 
it is possible for a significant amount of power to span the en- 
tire range of 2-k. In either of these cases, the median value, 
which is required to calculate the convergence criteria and is 
often used as the final value, would be heavily biased. In order 
to account for this, we must first center the distribution about 
the mode, such that the values are within the range mode ±7r 
before we calculate the median. Additionally, our scheme of 
finding the appropriate step size would fail if the angle is so 
poorly-constrained that no value of produced a Ax^ = 1- 

Stepping in ecosoj* and esinw* eliminates this complicat- 
ing angular value during the Markov chain, but introduces an 
arguably more complex requirement to deal with the Jacobian. 
The routine is required to return a determinant (even if it 
is 1 for no transformation) to make the priors more obvious 
to the end-user. After this, the MCMC routine handles the 
determinant weighting transparently in order to transform to 
the desired prior So with our code, using a Jacobian is triv- 
ial to deal with. However, as seen in Figure [T] this does not 
completely correct the prior 

Stepping in -yicosw* and -yesinw, eliminates the Jaco- 
bian and frees us from most of the burden of dealing with 
angular parameters, and is therefore practically the simplest. 
Additionally, it is comparable or faster than other parameteri- 
zations, and recovers a precisely uniform prior in e. For these 
reasons, we use it in EXOFAST. 

3.5. Eccentricity Bias 

It has long been understood that there is a bias against low 
eccentricities in binary systems. Such a bias is extremely im- 
portant to understand, as many of the observed systems are 
expected to be tidally circularized. If they are not, it has 
profound implications for our understanding of tides (and the 
tidal Q factor), the existence of additional bodies in the sys- 
tem which may be perturbing their orbits, and the formation 
and evolution of planets as a whole. 

3.5.1. Lucy-Sweeney bias 

The bias agains t e = t hat most people are aware of was 
first quantified by iLucy & Sweeney (1971) in the case of bi- 
nary stars, and is due to the fact that there is zero phase space 
at exactly e = 0, and therefore observational uncertainties will 
produce a best fit that is biased toward a positive value, even 
for intrinsically circular orbits. They say that, in order to 
measure a non-zero eccentricity with a 95% confidence, one 
must measure a result of e > 2.45(7^, rather than the naively- 
expected e > 2CT(., where is the standard deviation of the 
eccentricities. 

We simulated 100,000 data sets of intrinsically circula r or- 
bits with different noise like those described in ^3.4.21 and 
found the best-fit eccentricity using AMOEBA for each. The 
resultant histogram of best-fit eccentricities is shown in fig- 
ure |2] in cyan. This is the eccentricity distribution one would 
measure from a boots trap analysis, similar to that described in 
iLaughhn et alj (120051) . and clearly shows this deficit at e = 0. 

For comparison, we also plot the c ombine d PDFs of the 
100 trials of MCMC fits described in ^3A2l for each of the 
parameterizations of eccentricity. We clearly see the prob- 
lem of using a linear prior (red). The other distributions look 
similar, but a close inspection shows the ecosoj^,, esincj* dis- 
tribution is slightly biased high at e = because of the slight 
bias in the prior distribution. Fortunately, the difference be- 
tween the parameterizations is negligible relative to the width 
of the Gaussian, so it is of Httle practical importance. 
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Fig. 2. — The combined eccentricity PDFs for all 100 intrinsically circular 
orbits with the different parameterizations (legend same as figure[T), plus the 
distribution of best-fit values from 100,000 amoeba fits (cyan), demonstrating 
the problem with bootstrap analyses of bounded parameters, the linear prior, 
and the very sUght problem with the corrected ecoso;*, esinuit paramteriza- 
tion. 



Now a huge advantage of MCMC becomes apparent: in- 
stead of looking at simulated permutations of the data and 
finding the best fit (which has zero likelihood because it is 
infinitesimally small) like a bootstrap or prayer bead analy- 
sis does, MCMC considers the data as is. It is clear that the 
PDFs from the MCMC do not suffer from this bias to nearly 
the same extent, but there is still the matter of how to summa- 
rize such a non-Gaussian distribution. Obviously, the standard 
method of quoting a median value and a 68% confidence in- 
terval provides a misleading, marginally-significant, non-zero 
eccentricity - even the absolute value of a Gaussian peaked 
at zero has a median value of 0.67 sigma. Instead, we could 
fit a 3-parameter Gaussian to the PDF (normalization, zero 
point, and width) and use the zero point as the likely value 
and the width as its uncertainty. In principle, we can actu- 
ally infer a negative eccentricity, dramatically reducing the 
Lucy-Sweeney bias. Alternatively, we can quote an upper 
limit which does not assume the PDF is Gaussian, but does 
not give us a likely value or uncertainty. In the end, however, 
there is no substitute for a visual inspection of the PDF. By 
default, EXOFAST simply quotes the median values and 68% 
confidence interval as with all other parameters. Therefore, it 
is up to the user to inspect the PDF a nd assess the significance 
of the output eccentricity. Recently, iLucyl (120 12h introduced 
a new Bayesian method to evaluate the robustness of a mea- 
sured eccentricity, which may help. 

To see how this bias evolves with the significance of eccen- 
tricity, we calculated both the median and fitted values of the 
eccentricity of each of the 100 fits and averaged them together 
in order to reduce the statistical uncertainty and flesh out the 
bias. This mean of medians is plotted as a function of intrinsic 
eccentricity in figure |3] in solid lines, and the fitted Gaussian 
zero-points are shown as dashed lines for each of the the pa- 
rameterizations described above. A non-biased result would 
fall on the dotted line. As expected from the Lucy-Sweeney 
bias, we over-estimate the eccentricity of orbits with intrin- 
sically small eccentricities. The uncertainty in each value is 
roughly 0.007. 

The maximum measured eccentricity deviation out the 100 
trials was slightly more than 3(j- slightly more than we would 
expect by chance. In the real world with real data, system- 
atic uncertainties, unmodeled effects, or outliers will all tend 



Fig. 3. — The difference between the measured and intrinsic eccentricity as 
a function of the intrinsic eccentricity, using the same parameterizations as 
before (see Figure[T]for legend). The solid lines are the typical median value, 
while the dashed lines are the zero point of a 3-parameter gaussian fit to the 
PDF. We clearly see the effect of the Lucy-Sweeney bias at low eccentricities, 
and the additional bias due to the linear prior for all eccentricities. The values 
plotted are the average of the median/fitted values for all 100 fits of simulated 
data. The statistical uncertainties for each of the 100 fits was roughly 0.007 at 
each point. The small, systematic offset between the measured and intrinsic 
eccentricities at high eccentricites for all methods is within the uncertainties 
given we only ran 100 trials. 



to make the eccentricity appear even larger than this data set 
si mulated with white n oise. 

iLaughlin et al.l (I2005i) pointed out this bias in the case of 
planetary sy stems, though they in correctly called it a Lutz- 
Kelker bias (Lu tz & Kelke3ll973l) . The Lutz-Kelker bias is 
actually a volume effect described in the case of trigonomet- 
ric parallaxes. While parallax is a positive-definite parameter 
which suffers from a Lucy-Sweeney bias too, the Lutz-Kelker 
bias exists at all values of parallaxes. It states that, due to 
observational uncertainties and the fact that the number den- 
sity of stars is larger for those with a smaller parallax, more 
stars tend to have a true parallax that is smaller than what is 
measured. 

3.5.2. Linear Prior 



As shown in ^3.4. II if we step in ecosaj„ and esincj*, we 
introduce a linear prior This prior is clearly not supported by 
the observations of short-period planets to date, so failing to 
correct for it will therefore lead to a significant (up to ~ 5a) 
bias in the inferred eccentricity, as shown in Figure |3] in red. 
Since there is very little difference in the convergence time, 
many may find it easier to step in ^/ecosw* and ^/esmuJ^,, as 
we do. 

3.5.3. Metropolis-Hastings algorithm 

Another bias comes from an incorrect, but potentially- 
common, implementation of the Metropolis-Hastings algo- 
rithm. As we discussed in ^2.21 when we reject a step, we 
must make a copy of the previous step in its place. Since the 
acceptance rate is ideally around 20%, we will end up with 
each step copied, on average, five times. This algorithm is 
somewhat unintuitive: we might think we would end up with 
huge spikes in the parameter distributions where the chain got 
stuck (fortunately, not true), or that making 5 copies of each 
s tep is wasteful (and it is; see the discussion on thinning in 
^2.2\ . Instead, we might think that we should not copy the 
previous step. The effect of making this mistake is minimal 
for unbounded parameters, which makes it difficult to identify 
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with typical sanity checks. However, this misleading intuition 
can introduce a significant bias that guides fits away from any 
hard boundary, such as e = 0. Unfortunately, an obsolete ver- 
sion of an MCMC code which was distributed with the IDL 
astronomy library made this exact mistakepl Those who use 
this code, model their own codes from it, or make the same 
intuitive mistake will all suffer from this bias. 

A more subtle way to make essentially the same error is in 
the boundary handling. The proper meaning of a boundary 
(e.g., e = 0) is that the model's likelihood is zero (or the 
is infinite) beyond it. However, we cannot a priori restrict the 
model to step only in bounds. While it may seem more effi- 
cient not to allow the Markov chain to step out of bounds, we 
must allow the Markov chain to go out of bounds, get rejected, 
and make a copy of its previous step in the process. We make 
these boundary conditions intuitive and fast in our routines 
by checking them first and returning an infinite x ^ if it is out 
of bounds. The MCMC routine then automatically assigns a 
zero likelihood to this step and will always reject it, making 
a copy of the previous step in the process and preserving the 
meaning of boundaries. 

To demonstrate why copying this step is necessary, we re- 
peat the exercise from ^3.4.11 but without copying the previ- 
ous step when a step is rejected. The resultant prior distribu- 
tions of eccentricity are shown in Figure H] which shows the 
prior probability is strongly attenuated near imposed bound- 
aries. This, in turn, will bias the inferred values away from 
said boundaries. The depth of this attenuated region is pro- 
portional to the step size, which is typically equal to the un- 
certainty in the parameter We note that the affected region 
plotted here is exaggerated because the DE-MC code auto- 
matically chooses a large stepsize to efficiently fill all of the 
likelihood space. However, for values near (< 3 times the step 
size) a boundary, the prior probabilities are still significantly 
impacted by this bias. 
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Fig. 4. — The prior distribution of eccentricities for each parameterization, 
when the incoiTect implementation of the Metropolis-Hastings algoiithm is 
used, as described in the text, with the same legend as in Figure [T] The 
detailed effect this eiTor has depends on the particular parameterization of 
eccentricity, and the step size, which is ideally equal to the uncertainty. In this 
plot, the attenuation region is exaggerated because the differential evolution 
code picks a step size that is large enough to efficiently fill all of the allowed 
likelihood space, and thus is attenuated by both boundaries at once. However, 
this attenuation is significant as long as the median value is within a few 
stepsizes of a boundary. This shows a strong a priori bias against boundaries 
like e = when the MH algorithm is improperly implemented. 

The cun'ent version, distributed with the most recent IDL Astronomy 
library, coiTects this error, but the old version is still readily available on the 
web, six years after it was fixed 



Unfortunately, it is difficult to say how prevalent these latter 
two biases may or may not be in the literature, or, in the case 
of the improper MH algorithm, even how significant it is if we 
know it is present, since the bias depends on the parameteriza- 
tion and even the step size. The larger the step size, the larger 
the bias (due to the larger attenuated prior region). While the 
ideal step size is roughly equal to the one-sigma uncertainty, 
there is no guarantee that a suboptimal step size was not used, 
because it is not supposed to matter Nevertheless, some com- 
bination of these latter two biases may at least partly explain 
the large number of nominally-significant eccentricities, even 
after accounting for the Lucy-Sweeney bias, for systems that 
are expected to be tidally circularized. 

4. TRANSIT FITTING 

4.1. Transit Model 

A primary transit occurs when a planet passes in front of 
its star and blocks a portion of its light for a period of time. 
We monitor the star's brightness during this time by taking re- 
peated exposures, and then comparing the target star's bright- 
ness to an ensemble of comparison stars. If the star were uni- 
formly bright, the relative flux we would see during transit 
would be: 

F(t) = Fo[l-X] (16) 

where Fq is th e baseline flux a nd analytic equations for A'' 
are defined in iMandel & Agoll (Eq. 1, 2002). A*" is solely 
a function of the transit geometry, Rp/R^, = p, which is the 
radius of the planet in stellar radii and r/R^ =z, which is the 
projected distance from the center of the planet to the center 
of the star, in stellar radii and is a function of time. 

4.2. Limb Darkening 

In reality, stars are not uniformly bright - for typical broad- 
band optical/NlR filters, their apparent brightness falls toward 
the limb of the star, which is an effect called limb darkening. 
For main sequence stars, the intensity of the star, /, is well- 
described as functions of /i = cosf?,, where 0, is the angle 
between the observer and the normal vector on the surface of 
the star - that is, /i = 1 at the center of the star, where the 
normal vector points directly at the observer, and /i = at the 
limb of the star, where the normal vector is perpendicular to 
the observer's line of sight. 

There are many different ways to parameterize the intensity 
of the star We will discuss the most commonly-used laws for 
transiting planets. The linear limb darkening law, 

|^ = l-Mo(l-/i), (17) 

where mo is a limb-darkening coefficient, was the first 
obvious choice, but it quickly became clear it was in- 
sufficient to describe real surface brightness profiles (e.g. 
Klinglesmith & Sobieski 1970). For the precision of many 
ground based transits, the linear law is still sufficient, but the 
light curves from Hubble Space Telescope for HD 209458b 
showed the linear limb-darkening law to be inadequ ate for 
high-precision transit light curves ( Brown et al.ll200lT) . Thus, 
many have adopted a quadratic limb darkening law of the 
form: 

^=l-Ml(l-//)-M2(l-M)^. (18) 
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iMandel & Agoll (l2002h state that the quadratic Umb dark- 
ening law is sufficient to describe transit Ught curves with a 
precision of 10~^(p/0.1)^ - a precision that has never been 
achieved from the ground. This was confirmed empirically by 
ISouthw ortii' ('2008'). who showed that the quadratic limb dark- 
ening law was sufficient for the quality of data then achiev- 
able. However, the accuracy of the quadratic law is worse than 
the 20 parts per million that is achieved routinely from Kepler 
jKoch et al. 2010) for large planets {p > 0.04). For these plan- 
ets, Kepler must therefore use a non-linear limb darkening law 
of the form, 



ujp at this point - while using ujp makes more intuitive sense, 
we feel the consistent use of one value for the argument of 
periastron reduces the chance of accidentally misapplying one 
or the other, and a;, is already widely in use. 

Z is along the line of sight, where +Z is toward the observer, 
and the X-Y plane is the plane of the sky. Neither transits 
nor RV can constrain the longitude of the ascending node, f2, 
which is the angle from North to the ascending node, mea- 
sured counterclockwise (i.e., the rotation of the X-Y plane), 
but for completeness, the orientation with respect to an ob- 
server on Earth is 



1(1) 



l-fli(l-/i'''2)-a2(l-Ai)-fl3(l-M^''^)-fl4(l- 



lA 

(19) 

Unfortunately, our theoretical predictions of the coeffi- 
cients has proven ins ufficient for very precise light curves (e.g 
iKnutson et alj |2007), and either the limb-darkening needs to 
be fit by the data, or perhaps newer, more precise models 
based on 3D hydr odynamical models of stellar atmospheres 
may be sufficient dHavek et al.l 120121) . Since the transit flux 
given by the quadratic Umb-darkening law is applicable to all 
but the most precise light curves and is significantly faster to 
compute than the non-linearly limb-darkened transit flux, we 
limit our discussion to the quadratic limb darkening law. Note 
that in the discussion that follows, we can reproduce the linear 
la w precisely by fi xi ng to be zero. 

IMandel & Agoll (12002!) give the quadratically Umb- 
darkened flux during transit as: 



F(t) = Fo{ 1- 



( 1 - M I - 2u2)y + (Ml + 2U2) [X'^+^Oip- Z)] - U2T]'' 



1 -M1/3-M2/6 



where A"', and 77'' are given in table 1 of IMandel & Agoll 
(I2OO2I) for all possible geometries. Like A*", A'' and rj'^ only 
depend on p and z- 8 is a step function equal to 1 where 
p> z and elsewhere. 

As a side note, in Appendix [A] we show that both the 
quadratic and linear limb darkening flux during transit can be 
can be written as linear combinations of analytic functions. 
Therefore, the limb darkening coefficients can be solved ana- 
lytically for fixed p and z. 

4.3. Planetary Path 

Given these analytic expressions, generating a model 
lightcurve becomes a matter of computing z for all times, 
which is similar to computing the RV. First, we calculate 
the true anomaly in the same way as before (i.e., using Equa- 
tions [8j|9l and [Toll. Then, the three-space coordinates of the 
planet's position relative to the star, as seen from Earth, are 



(1- 



l+ecos9{t) 
X =-rcos{6(t) + uj^,) 
Y =- rsin(6(t) + uj^)cosi 
Z =r sin (d(t) + w,) sin ; 



(21) 



where r is the distance from the center of the star to the center 
of the planet as a function of time, a is the semi-major axis, 
and iJ* is the stellar radius. Some have opted to mix and 



X' =-Xcosn + Ysmn 
Y' =-Xsmn-Ycosn 

z'=z 



(22) 



where -X' is East and +Y' is North. For concreteness, we as- 
sume f7 = 180°, which means X = X\Y = Y', and Z = Z'. This 
implies that, during primary transit, the planet moves from -X 
to +X and at X = 0, F is equal to the opposite of the impact 
parameter, -b. Finally, 



z=VX^ + Y\ 



(23) 



where z is in un i ts of s tellar radii, exactly as required by the 
IMandel & Agoll (12002 *) code. We must take care to note the 
sign of Z - both transits and occultations occur when z<l+p, 
but it is a primary transit when Z > 0, and a secondary eclipse 
when Z < 0. 

The calculation of the planetary path is done in our pro- 
V gram EXOFAST_GETB and includes the general handling of 
Q, which would be useful if one would like to include astro- 
' metric measurements. 

While we do not support fitting the secondary eclipse, the 
calculation of its model flux is identical save a couple minor 
substitutions: p becomes 1/p, z becomes z/p, and the limb 
darkening of the planet can be ignoredQ The resultant model 
is the observed flux from the planet as it is blocked by the 
star. In general, this will be some combination of thermal and 
reflected light, but without knowing the temperature, albedo, 
and thermal redistribution (e.g., from a phase curve), the two 
cannot be distinguished, and thus only one additional param- 
eter for each observed bandpass is required for the normaliza- 
tion of the planetary flux (i.e., the eclipse depth). This nor- 
malization is a linear parameter, but note the warning above 
about hybrid linear and non-linear fits. We stress that this is 
not the same as fitting different values of p for the primary 
transit and secondary eclipse. The shape of the ingress/egress 
and the duration of the eclipse require p to be the ratio of radii, 
not just the square root of the depth. 

4.4. Parameterization 

We also need to define the parameterization of the transit 
light curve, which is much less obvious and has been done 
many different ways in the literature. Most, however, have 
agreed upon 7c, Fo, logP, and the quadratic limb darkening 
parameters ui and U2- While the eccentricity is also required 
to derive the model, it is often fixed at the best-fit values 
from the RV. The remaining parameters, which determine 
the shape of the transit, have no universally-accepted param- 

Tliis does, however, require the bug fixes described in Appendix |B.2| 
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eterization, likely because each parameterization has its own 

a dvantages and disadvantage s. 

ISeager & Mallen-OrnelasI (120031) suggested Tj, the total 
duration of the transit (first to fourth contact), 7>, the du- 
ration of the flat part of the transit (second to third con- 
tact), and the transit dept h, 5 (for non-grazin g transits with no 
limb-darkening, 5 = p^). ICarter et"aD (12008b suggested that a 
less-correlated parameterization would be r, the duration of 
ingress or egress (i.e., from first to second contact or third 
to fourth contact), T, the duration from mid-ingress to mid- 
egress, and S. 

Unfortunately, both of those parameterizations are unde- 
fined for grazing geometries, and therefore it is impossible 
to correct the prior distributions to be physical for all geome- 
tries. Grazing transits would be poorly fit, and near-grazing 
geometries may be unfairly biased by the parameterization 
(See ICarter et al.l (120081) for a discussion). For this reason, 
we advocate a more physically-motivated parameterization: 
log(a/R^,), cos;, and p. The advantage to this parameteri- 
zation is that it intuitively imposes reasonable priors on the 
physical parameters, and is well-defined for all geometries. 
The disadvantage is that they are further removed from what 
is actually measured (the shape of the transit) and the covari- 
ances between these parameters is large. Fortunately, the DE- 
MC algorithm automatically takes the covariances into ac- 
count, so this is much less important. 

4.5. Other biases 

A Lucy-Sweeney-like bias exists for all bounded param- 
eters, which includes cos/ (in the case of transiting planets 
when we cannot distinguish between ±cos;), and p. Unfor- 
tunately, the Lucy-Sweeney-type bias for cos / is unavoidable, 
but unlike eccentricity, where we expect Hot Jupiters to be 
tidally circularized and therefore e to be exactly 0, there is no 
reason to expect a planet to be exactly edge on. Therefore, we 
are significantly less likely to encounter such a bias. Further, 
the theoretical interpretation of such a system does not qual- 
itatively change if we measure a small inclination, whereas 
a small non-zero eccentricity for a planet that is supposed to 
be tidally circularized requires exotic explanations, such as 
anomalous values of the tidal Q factor or additional bodies 
perturbing the system. However, we still need to be careful 
not to overinterpret nominally-significant impact parameter 
changes if they are close to 0. 

We can avoid a bias in p altogether by thinking about 
how our model would behave if negative values were al- 
lowed. While unphysical, we can make the identification 
that a negative planetary radius would add flux during tran- 
sit. To that end, we allow negative planetary radii, calculate 
the flux decrement as if it were positive, and then add the 
flux to the baseline rather than subtract it. This avoids the 
Lucy-Sweeney-type bias, since a negative value of p implies 
a unique, well-defined likelihood. If the median p is nega- 
tive with low significance, it is likely there is no transit at all. 
If it is negative with high significance, there are likely large 
systematics in the data. If, however, we see a small tail at neg- 
ative values but the result is statistically significant, we can be 
more confident that the non-zero measurement is real, and not 
a result of a bias in fitting. This will be particularly useful 
when measuring small transits with low significance, whose 
depths would otherwise tend to be overestimated, just like ec- 
centricity. We could achieve a similar effect for secondary 
eclipses by allowing the normalization to be negative. 

While we choose to step in the same trick could be 



played with S. However, since p is required to calculate 
the model transit, and p = \/S, we would have to redefine 
p = signi5)y/\S\ to avoid an imaginary p. It is not yet clear 
which would impose a more physical prior 

One may consider stepping in logp, which has no such 
positive-definite requirement, similar to our steps in \ogK, 
logP, or logo//?,. However, in the case of p, systematics 
in the data may be present which mimic a negative p, which 
is not the case for the other parameters. If our model is not 
allowed to fit such a systematic, we would unfairly bias the 
result toward positive values. 

4.6. Transit Best Fit 

Transits are first identified in transit surveys using data 
from relatively small telesco pes that monitor l arge areas 
of flie sky at ori ce (TrES, Alonso et al.' 12004, HATNet, 
i Bakos et al.l J200l XO, McCullough et al. 200^ C oRoT , 
'Baglin et al.' 2006'; SuperWASP, Collier Cameron et al. '20071 
KELT, Pepper et al. 2007; Kepler, Borucki et al. 2010; QES, 
Alsubai et al. 2 01 Ih. They us e a Box Least-Squares (BLS) 
algorithm ( Kovacs et al J 120021) . which extracts the duration, 
depth, period, and Tc of the transits. For the high-precision 
transits which this code was designed for, these quantities will 
be already roughly known, either from the literature or BLS 
fits to their own survey data. With these parameters, the prob- 
lem is greatly simplified to finding a local minimum around 
relatively well-behaved region of parameter space. 

With good starting values for P and 7c, we can begin with 
fairly generic guesses for the rest of the parameters and stan- 
dard local minimization routines like AMOEBA work well to 
find the local minimum. Once found, we follow the same 
procedure as the RV data (^ and scale the uncertainties to 
getP(x2)=0.5. 

Usually, transit fits of survey-quality data are too degenerate 
to robustly fit for the impact parameter, and it must instead be 
fixed to zero (i.e., central crossing). However, because we 
simultaneously fit the stellar properties (see §|5), our code has 
been tested on KELT survey data and works well when given 
a good starting value for Tc and P from an independent run 
of BLS. Therefore, EXOFAST may also be a useful tool for 
vetting grazing eclipsing binaries from survey data. 

5. RADIAL VELOCITY AND TRANSIT 

For simultaneous fits to RV and transit data, the models 
themselves are the same, but the advantage of fitting the data 
simultaneously is that they both constrain many of the same 
parameters, which improves the quality of both fits and ulti- 
mately gives us a clearer picture of the system as a whole. 
Further, we can include additional effects with no penalty, 
such as the light travel time difference due to the reflex mo- 
tion of the star 05.3l l. More important, covariances between 
parameters in the different data sets may be unintuitive and 
non-negligible. Fitting the two data sets separately assumes 
the covariances between the parameters in the two data sets 
are zero, whereas a simultaneous fit naturally takes these co- 
variances into account. 

5.1. Parameterization 

The disadvantage of a simultaneous fit is having to rethink 
the parameterization of the problem, since the overlapping 
constraints are not always intuitive. The parameters logP, 
y/ecosuj^,, ^/esmuJ^,, and Tc trivially overlap between the two 
data sets. logK, 7, 7, are still mostly independent parameters 
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for RV, and Fq, cos/, logo//?*, and p are mostly independent 
for photometry. 

However, the combined parameterization is actually a one- 
parameter family of solutions, meaning that with an estimate 
of the mass of the star, radius of the star, or a clever combi- 
nation of the two, we can solve the entire system precisely, 
including the stellar mass and radius. Of course, the mini- 
mum mass of the planet, Mp sin /, cannot be determined from 
RV without estimating the mass of the primary (even if we as- 
sume ^ Mp), and all of the physical parameters from the 
transit scale with R^, (and we must assume 3> Mp), which 
the transit cannot constrain (ISeager & Mallen-Orn elas 2003). 
Since we must use external information anyway, it behooves 
us to do it during each step in the Markov chain and use all 
of the information of the two data sets to their full advantage 
while simultaneously exploring the covariances between all 
parameters. 

The stellar surface gravity, ^» (often measured as logg), 
is equal to GM^/R^, and is one clever combination of stel- 
lar parameters that allows us to break the degeneracy. With 
that, Kepler's law, and equation|2l the semi-major axis of the 
planet's orbit, true mass of the planet, mass of the star, and 
radius of the star, all in physical units with no approximation, 
become a function of observed quantities: 



47r2 



KPVT- 
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Mp = 
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InKa^Vl-e^ 
GP sin / ' 



(24) 
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With the approximation that ^ +Mp, the latter terms 
in the equations for a and drop out, and K (i.e., radial 
velocity) is no longer required, meaning we could apply this 
technique generally to transit fits alone (of course, losing our 
constraint on Mp). In fact, this is how we actually fit transit- 
only data sets. 

However, we have no constraint on logg from the transit or 
radial velocity alone, so simply adding this parameter to the 
model without additional information will force the Markov 
Chain to inefficiently explore this degeneracy. This problem 
is most usually solved by iterating between light curve and RV 
fitting and isochrone modeling (e.g., Yi et al. 2001) to get the 
model parameters (e.g., B akos et al.. 201 2). but that sort of it- 
eration does not properly account for covariances between the 
stellar and planetary parameters. Worse, those fitting follow- 
up light curves almost always assume the fixed stellar param- 
eters derived from the discovery paper and simply ignore the 
inconsistency between the stellar density implied by their new 
light curve and their assumed s tellar parameters. 

Recently. lTorres et al.l (120101) determined an empirical poly- 
nomial relation between the masses and radii of stars, and 
their logg, effective temperatures, T^ff, and metallicities, 
[Fe/H] (see their Table 4) based on a large sample non- 
interacting binary stars in which all of these parameters 
were well-measured. This is essentially a computationally- 
convenient way of modeling isochrones, which imposes the 
same mass-radius constraint to break the degeneracy, but is 
fast enough to incorporate at each step in the Markov chain. 



Therefore, we add log g, T^ff, and [Fe /H] to our stepping pa- 
rameters and, at each step, we use equation |24] to derive the 
self-consistent M^and /?*that is used to generate the model. 
Finally, we calculate what the Torres relations would predict 
for and R^,, and apply a prior penalty to the for the 
difference between the Torres values and our model values, 
using the scatter about their fitted relations {(Jiogiw, = 0.027 
and (JiogR, = 0.014), as the prior width. 

The constraint on T^ff and [Fe/H] from the Transit data, 
RV data, and Torres relation is very poor, resulting in highly 
uncertain values for M^, and Fortunately, logg, T^ff and 
[Fe/H] can be easily measured with a high-quality spectrum, 
and can then be applied as priors during each step of the fit 
for a precise estimate of the stellar parameters. Therefore, our 
total at each step is 
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plus penalties for any other priors we choose to impose. When 
done this way, the Torres relation, T^ff, and [Fe/H], plus the 
density of the star from the transit can often constrain logg 
better than its spectroscopic counterpart, more directly and 
more precisely constraining and R^,. These constraints, 
in turn, feed back directly to the fundamental planetary pa- 
rameters we care about most (e.g., Mp and Rp). Sometimes, 
logg can actually be better-constrained by the spectroscopy, 
in which case that constraint feeds back into the constraint on 
a/R^,, and the other transit parameters. 

Adding the "prior" penalty to and R^, from the Torres 
relation in this manner is somewhat unconventional. Typi- 
cally, priors are static and come from previously-fit data, not 
model-dependent and derived at each step. To avoid this, we 
considered stepping in logg, T^ff, and [Fe/H] and use the M^ 
and R^, from the Torres relation to break the degeneracy, but 
then we would be solving a one-parameter degeneracy with 
two parameters, over-constraining the model and leading to 
inconsistencies. Said another way, the Torres relation is not 
mathematically self-consistent: the input logg does not pre- 
cisely equal the logg derived from the output and R^,, and 
therefore there would be multiple ways to calculate critical 
parameters. Additionally, the theoretical scatter in the Torres 
relation would set a floor to how well we could measure M^, 
and regardless of other constraints. Finally, if we were to 
do it this way, T^ff and [Fc/H] would define and R^, not 



EXOFAST 



15 



simply constrain it. Therefore, we would lose the power to 
influence T^ff and [Fe/H] away from their prior values. While 
typically, the parameters derived in different ways are statis- 
tically consistent, as we would expect, if we use the output 
and to generate any piece of our model, rather than as 
a prior constraint, the model itself would be mathematically 
inconsistent. Applying these model-dependent priors uses the 
information encoded in the Torres relation, which may more 
appropriately be thought of as data which we merely expect 
to be statistically consistent, while maintaining the mathemat- 
ical sel f-consis tency of the model. 

Eno ch et al.l (|2P10) recasts the Torres relation in terms of 
instead of logg, with the idea being that the transit precisely 
constrains (ignoring Mp), and so it is a more efficient step- 
ping parameter for the Markov chain, which is true. Unfortu- 
nately, for the same reason the transit precisely constrains p,, 
we can no longer derive independent constraints on and 
R^,. Therefore, we would be required to use the output M» 
and R^ to link the RV and transit models, and we are left with 
a mathematically inconsistent model (their input does not 
equal the derived from the output M, and R^ either), and 
we are left with no additional constraint on T^ff and [Fe/H] . 

One reasonable criticism of our method is that it assumes 
the Torres mass and radius relations are independent. To the 
extent this is not true, we are double-counting the prior im- 
posed from the Torres relations, and artificially reducing our 
errors. Alternatively, we can discard a relation, either for 
or for M*, but then we assume the relations are perfectly cor- 
related and we lose information to the extent that they are not. 
A more correct method likely lies somewhere in between, ex- 
plicitly accounting for the covariances between the and 
relations. 

A consequence of using the Torres relations (or any stellar 
model) is that we inherit their limitations - i.e., EXOFAST, 
in its current form, should only be a pplied to "si ngle (post-) 
main-sequence stars above 0.6 Mq" (iTorres et al.l 2010). For 
stars outside of this regime, the prior for the Torres relation 
should be removed, and an independent prior on the stellar 
mass and/or radius should be imposed by editing the func- 
tion (a trivial task assuming another measurement is avail- 
able). The code will issue a warning if this regime is encoun- 
tered, which can be safely ignored if it is only issued during 
the burn-in period. 

A final note about this procedure is that the covariance be- 
tween logg and a/R^ is extreme because they only differ by 
one power of R^. If this covariance is not accounted for when 
stepping in the Markov Chain (e.g., with the DE-MC algo- 
rithm), the Markov Chain becomes extraordinarily inefficient, 
taking hundreds of times longer than it otherwise would. 

5.2. Limb Darkening 

With the values of logg, Teff, and [Fe/H] from the steps 
in the Markov chain (and the observed bandpass), we can 
also derive the limb-darkening paramete rs by interpolating 
the tables from Claret & BloemerJ (1201 ll) . This interpolation 
is done in our module QUADLD. However, the tables contain 
a substantial and poorly-quantified systematic error If this 
error were not taken into account, we would underestimate 
the errors on any co variant parameters. Worse, if the data 
were of sufficient quality to constrain the limb-darkening pa- 
rameters, they would become an implicit constraint on logg. 



dHavek et alj 120 12h . this would in turn bias the stellar pa- 
rameters and all derived parameters. To account for this, 
we do something similar to what we described above with 
the Torres relation. We estimate the error on the quadratic 
limb darkening paramete rs, (t„ i and a,, ?, to be 0.05, based 
on Figure 1 of Claret & BloernenI (1201 ll) . Then, we step in 
both Umb darkening param eters, ui and U2, calculate what the 
IClaret & BloemenI (1201 ll) tables would predict from the cur- 
rent steps in logg, T^ff, and [Fe /H] , for the limb darkening co- 
efficients, ui ciarei and U2.ciarei, and add an additional penalty 
to the (Equation l25Tl of the form: 



U\-U\ ^Claret \ 



U2 ~ ^2. Claret 



(26) 



Teff, and [Fe/H 



(and therefore the mass and radius of the 



star). Since we know the limb darkening tables to be flawed 



Ideally, we would like to have accurate errors for the limb 
darkening coefficients and use more accurate, non-linear limb 
darkening tables from 3D hydrodynamical models, but nei- 
ther are currently available. Additionally, a non-linear limb 
darkening model is currently impractically slow to calculate 
for large data sets, as our speed enhancements described in 
Appendix|B]would no longer apply. 

If an accurate grid of theoretical limb darkening parame- 
ters with well-understood uncertainties were available for a 
wide range of stars, EXOFAST may be able to work back- 
wards, and use the limb darkening to constrain logg, Teff, 
and [Fe/H]- and therefore the masses and radii of stars - 
from a precise light curve alone - a tantalizing possibility. 
In Appendix lAl we describe how one could linearly fit the 
quadratic limb-darkening parameters, which may prove use- 
ful with such a scheme. 

5.3. Reflex Motion 

Having the semi-major axis in physical units provides an 
absolute scale to the system, which allows us to include the 
reflex motion of the star without introducing any additional 
parameters. Equation |2T] assumes a barycentric coordinate 
system, but the RVs are measured in the stellar-centric frame, 
and the transits are measured in the planet-centric frame. The 
light travel time (Roemer delay) between these frames can be 
large, and is analagous to the standard correction from the He- 
liocentric Julian Date (HJD) or Julian Date (J D) to Barycen- 
tric JuH an Date (BJD) in our own solar system (lEastman et al.l 
l20T0bl) . 

One can attempt to correct the flux of the transit (Loeb' 
2005) or the observed radial velocity to account for thisQbut 
it is far more straight-forward to transform the observed time 
stamp into the target-barycentric frame, which is done at each 
step in the Markov Chain with our code BJD2 TARGET. 

This effect is most important when trying to reconcile ob- 
servations for which the information comes from different 
points in space, such as the primary transit, secondary eclipse, 
radial velocity, phase curves, or multiple transits of differ- 
ent planets around the same star. For a typical Hot Jupiter 
{a ^ 0.05 AU) the difference between different reference 
frames is ^ 30 seconds, and it is obviously much larger for 
planets farther from their stars. 

Even in the case of a single observation of a transiting Hot 
Jupiter, the duration of a transit (as seen in the Solar System 
Barycenter frame) differs by 0.15 seconds from its true value 

We could even transform the model flux or model radial velocity to truly 
as-measured, geocentric frame ! 
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if this effect is ignored, which is only marginally negligible 
by today's standards. For Hot Jupiters, the offset between 
the stellar-centric and bary centric frame is negligible (^10 
ms), but grows with the semi-major axis and the mass of the 
companion. 

While the target-barycentric times are important for dynam- 
ical studies of Transit Timing Variations (TTVs), they are 
really only necessary when there are two transiting planets, 
which our code does not consider With a single planet, the 
offset between the frames for all primary transits will be con- 
stant, so the TTVs will not be different. To avoid confusion, 
and because the times in the Solar System Barycentric frame 
(i.e., BJDtdb) are most precisely knowrPl we only quote the 
times in BJDtdb- 

This conversion back to BJDtdb is done by our module, 
TARGET2B JD. To be clear, this procedure automatically rec- 
onciles the locations of each observation, which for example, 
naturally constrains the eccentricity from the primary transit 
and secondary eclipse timing. 

We could calculate the Roemer delay due to the systemic 
velocity, 7, with no additional parameters. Since the system 
may be receding or approaching us, the systemic velocity will 
tend to stretch out or compress the arrival time of the pho- 
tons by a constant factor, 1 +7/C. However, since RVs are 
often arbitrarily normalized, we do not always have a good 
measurement of the absolute systemic velocity. More impor- 
tant, the only practical effect such a correction would be to 
change the period by a factor of 1 +7/C. While this < 0.1% 
difference in period is already highly statistically significant 
in many systems, the practical implications of such a differ- 
ence will be completely obscured by the uncertainties in the 
other parameters for the foreseeable future, and quoting this 
corrected period would only make deriving the ephemeris in 
the observed frame more error-prone. Therefore, we actively 
chose not to correct for the additional Roemer delay intro- 
duced by 7. 

There are several other minor effects on the arrival time of 
the photons that we ignore, because they would require ad- 
ditional free parameters which are generally not well-known 
and the effect is very small. These include the light travel 
time from our baryc entric fr ame to the target barycentric 
frame, proper motion (lRafikovii2009) . parallax (Scharf 20071) . 
and general relatiy istic precession ([Jordan & Bakosri2008l 
iPal & Kocsisl[200a) . 

5.4. Ignored Effects 

The ignored effects are numerous, and include the Rossiter- 
Mclaughlin (RM) effect. Transit Timing Variations (TTVs) (if 
multiple epochs are fit simultaneously, a constant ephemeris 
is assumed), secondary eclipses, a distance constraint on the 
stellar radius, reflection from the planet, ellipsoidal variations 
of the star, relativistic beaming, lensing, gravity brightening, 
non-Keplerian gravitational interactions, and tidal effects, to 
name a few. It is expected that for a given system, many will 
want to modify the code slightly to include some of these 
additional effects. We have attempted to make the code as 
modular and comprehensible as possible, such that someone 
familiar with IDL could use EXOFAST as a starting point or 
template for their own, more specialized code without neces- 
sarily needing to master the details of modeling. Future ver- 
sions of our code may include some of these ignored effects 

i.e., it does not include tlie uncertainty tlie light travel time due to the 
uncertainty in the target's semi-major axis 



as we have occasion to worry about them, and those codes 
may be made publicly available. 

5.5. Combined Fit 

Now, with a mere 16 parameters, 7, 7, Tc, \ogP, y^cosw*, 
v^sinw*, \ogK, cos/, p, Fq, loga/R^, \ogg, Tgff, [Fe/H], ui, 
and U2, we can describe the both the radial velocity and transit 
data simultaneously, including often neglected effects. 

Since we have a very good guess for the starting values for 
each parameter from the individual fits, no global fit is re- 
quired. Assuming they are consistent with one another, we 
only need a slight refinement to find the best fit of the com- 
bined data sets. Therefore, using the starting values for the 
best fits we found in both individual cases, we run a quick 
AMOEBA minimization to find the local minimum of the com- 
bined data set, and we are finally ready to use our MCMC 
code, EXOFAST_DEMC, to determine the model uncertainties 
and covariances. 

To summarize, our procedure to calculate the after we 
have stepped in the 16 parameters above, is as follows: 

1 . Check the boundary conditions for each parameter If 
any parameter is out of bounds, we immediately return 
X^ = oo. 

2. Compute a, 7?*, Mp, and M* (Equation[24li. 

3. Compute M* and from the Torres relation. 

4. Interpolate the IClaret & BloemenI (120111) tables to get 
the quadratic limb darkening parameters at the given 
\ogg, Teff, and [Fe/H]. 

5. Compute Tp. 

6. Correct the observed times to the target's barycentric 
frame ( ^53[ . 

7. Calculate the RV model (©■ 

8. Calculate the transit model (® . 

9. Compute the total (Equations |25] and |26]| . 

6. AN EXAMPLE 

To explain how to use the code, describe its outputs, and 
validate it at the same time, we now follow an example fit 
from beginning to end using RVs and transit data of HAT- 
P-3b from Torres et all (12007b (T07 hereafter), hosted on the 
exoplanet archive. This system was chosen nearly randomly 
- we just looked down the list of HAT planets and took the 
first one with enough public data and only one source for RVs 
(since the public version of EXOFAST does not fit multiple 
RV zero points). 

Since the transit data were in BJDutc and magnitudes, we 
converted it to the required format of BJDtdb, and normalized 
flux, and wrote them to a file called "hatS.flux."' We also 
converted the times of the RV data points from HJDutc to 
BJDtdb and wrote them to a file, "hat3.rv." 

Next, we have to make decisions about what information 
we have the power to constrain and which parameters should 
be constrained by priors from external information, or held 
fixed. As is always the case, we must adopt priors for T^ff and 
[Fe/H] based on spectra, which T07 quote in their Table 2. 
As we explain later, we also choose to use the spectroscopic 
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constraint on logg. For all spectroscopic priors, we use the 
larger uncertainties they describe in the text. Like T07, due 
to relatively poor phase coverage, there was no power to con- 
strain the eccentricity, so we fixed it to zero and did not at- 
tempt to fit a slope. The relatively short range of data for the 
RV meant the transit survey data constrained the period signif- 
icantly better, so T07 fixed their period at that value. Instead, 
we use the value from their photometry as a prior Note that 
a suitable fit was possible without this prior; the uncertainties 
were just larger Finally, we must specify the band in which 
the transit was observed, in this case, Sloan i'. 

Once we have decided on what to fit, we simply call the 
code to reflect our wishes: 

EXOFAST, TRANPATH= 'hat3 . flux' , /NOSLOPE, $ 
/CIRCULAR, PRIORS=priors, BAND= 'Sloani' , $ 
RVPATH= 'hat3 . rv' , PNAME= 'HAT-P-3b' , $ 
MINP=2 . 85, MAXP=2 . 95 

where the name of the observed bandpass is given as 
"BAND", "PNAME" is the case-insensitive name of a planet 
in exoplanets.org, from which the starting values or spectro- 
scopic priors can be pulled. This is generally not necessary, 
but gives the fit a more robust starting point. The "PRIORS" 
input is a 2D array containing the prior value and 1-sigma 
uncertainty for each parameter To specify a parameter with 
no prior, a width of infinity should be used. The parameters 
"MINP" and "MAXP" limit the range of the Lomb-Scargle 
periodogram, and is typically not necessary, but these RV data 
are too sparse to get reliable results from the periodogram. 
Further details on the calling structure and additional features 
can be found in the documentation of the code. 

EXOFAST then fits the RV data as described in ^ scales 
the uncertainties, fits the transit data as described in §3] scales 
its uncertainties, then fits the two data sets combined. It deter- 
mines the appropriate stepping scale, and begins the Markov 
chains, giving a status update about the number of accepted 
steps, and estimates the time to completion, if it were to take 
the maximum number of steps. Once it has taken 5% of the 
maximum number of allowed steps, it will estimate if the 
chains will be well-mixed by the time it is done. If it is 
expected to be done before it hits the limit, it will estimate 
when. If not, it will recommend a thinning factor In our case, 
the chains were well-mixed in about five minutes on a stan- 
dard desktop computer This is slightly faster than a general 
run because the orbit was circular This eliminates two free 
parameters and makes the normally expensive solution to Ke- 
pler's equation trivial. Still, for a similar number of points, ten 
minutes is fairly typical, even when eccentricity is included. 

The estimated time of completion, or the recommended 
thinning factor is very rough, and should only be trusted to 
a factor of 2-5. While the thinning factor of Nthin means it 
may take up to Nthin times longer, it will stop as soon as it 
is well mixed, so it pays to be a little conservative - for that 
reason, the suggested value is twice what it actually calculates 
it needs. The mixing criteria described in ^2.21 are very con- 
servative. Indeed, we have run chains with fewer than 100 in- 
dependent steps (as opposed to the required 1000) that did not 
differ significantly from chains that were fully mixed. There- 
fore, if you find that the recommended thinning value would 
imply a prohibitively-long execution time, you may wish to 
proceed with caution in interpreting a chain that gives such a 
warning. Otherwise, restarting the chain with the suggested 
thinning factor is highly recommended. Alternatively, if you 
have a 64-bit machine (and a 64 bit version of IDL), you could 



increase the maximum number of steps by this same factor, 
but there is very little difference in the quality of the final 
output or the execution time, and thinning makes the chains 
smaller and more manageable without throwing away much 
useful information. 

6.1. Outputs 

Once the program runs to completion, there are several out- 
puts. First, are publication-ready plots of the data with the 
best-fit model overplotted, and residuals below, as shown in 
Figure |5]and|6]for our example. 




0.0 0.2 0.4 0.6 0.8 1.0 
Phase + (Tp - Tc)/P + 0.25 



Fig. 5.— The best-fit of HAT-P-3 RV data from T07, as output by EXO- 
FAST, forced to a circular orbit and with no slope. The units of the x-axis are 
chosen so primary transit will always be at 0.25, and for circular orbits (as in 
this case), secondary eclipse will be at 0.75. 
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Time - Tq (hrs) 

Fig. 6.— The best-fit transit HAT-P-3b data from T07, as output by EXO- 
FAST. 



By setting the debug flag, these plots can be drawn to the 
screen during each call to the routine. As the name im- 
plies, it is useful for debugging if the fit is not working. It 
can also be instructive - if, for example, the debug flag is left 
on during an AMOEBA fit, it will essentially play a movie of 
the routine settling into the best fit. In this manner, one can 
gain an intuitive feel for how the algorithm works (or why it is 
failing). It is also instructive to leave it on at the beginning of 
an MCMC fit, to gain intuition for the difference between the 
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steady convergence toward the best fit of the AMOEBA al- 
gorithm versus the more stochastic MCMC algorithm, which 
wanders around the vicinity of the best fit. Obviously, this 
slows down the fit by orders of magnitude and is not intended 
to be used during a standard fit. 

Perhaps the most powerful aspect of MCMC is the fact that, 
with the parameter chains in hand, it is trivial to generate me- 
dian values, uncertainties, and covariances for derived quan- 
tities - we simply perform the appropriate transformation to 
each step in the entire chainQ As such, we quote the median 
values, along with their 68% confidence intervals for several 
other derived quantities of interest as a publication-ready La- 
TeX source code file for the deluxe table shown in Table |2] 
EXOFAST_LATEXTAB, which generates these tables, auto- 
matically rounds the two-sided uncertainties to two significant 
digits each and rounds the median value to the higher preci- 
sion uncertainty. When the uncertainty is symmetric, we use 
the ± symbol; otherwise, we list both uncertainties in the stan- 
dard fashion. This fast and accurate way to go from parameter 
arrays to properly-formatted latex source code is not just con- 
venient, but also mitigates the risk of typos introducing catas- 
trophic errors. Even if it is not intended to be inserted into 
a paper, it is often more readable as appropriately-rounded 
LATEX code than simply printing the median values, particu- 
larly since it reorganizes the fitted and derived quantities into 
a more intuitive layout, as we have done in Table |2] to group 
parameters into stellar, planetary, RV, transit, and eclipse cat- 
egories. 

However, just because the program will provide an an- 
swer that is easy to publish does not necessarily mean it is 
publication-ready. It is always wise to inspect the parameter 
distributions for strange behavior that may compromise the 
results, and check that the parameters themselves make sense. 
As such, we also output the probability distribution functions 
for each parameter, including derived parameters - a subset of 
which is shown in Figure|7] The best-fit values (the lowest 
achieved by the Markov chain) are shown as a solid vertical 
line over each distribution. These plots are not typically ex- 
pected to be published, but are rather for diagnostic purposes 
only. 

Another interesting diagnostic is the plot of covariances, a 
subset of which are shown in Figure|8] We plot contour plots 
of each parameter against each other parameters, including 
derived parameters, where the contours show the 68% and 
95% confidence intervals. We hide the numerical values on 
the axis labels for readability - the shape is the most impor- 
tant diagnostic here. The value above the plot is the correla- 
tion statistic. The solid black dot is the best-fit value of the 
two parameters. We note that our routine to generate these 
contours, EXOFAST_ERRELL, is not a standard IDL routine 
and may be of general interest. Again, the roughness of these 
covariance plots would smooth out with longer chains. 

The parameter chains themselves may be useful for addi- 
tional diagnostics or analysis not performed by EXOFAST, 
such as creating publication-quality plots of particular pa- 
rameter distributions or covariances. We output an IDL save 
file that includes the array of steps populated by the Markov 
chain, including all derived quantities. This array includes the 
burn-in, in order to maintain a complete record of all steps. 
The save file also includes the corresponding each step, 
the latex-style names of all parameters, and the index of the 

Note that our piiors are not likely uniform in any of these derived pa- 
rameters, and we should be aware of that as we interpret these values. 



TABLE 2 

Median values and 68% coneidence interval eor HAT-P-3B 



Parameter 



Units 



Value 



Stellar Parameters: 

M* Mass(M0) 

R, Radius (Rq) 

L* Luminosity (Lq) 

p« Density (cgs) 

log(g« ) . . Surface gravity (cgs) 

Teff Effective temperature (K) . 

[Fe/H].. Metalicity 



Planetary Parameters: 

P Period (days) 

a Semi-major axis (AU) 

Mp Mass (Mj) 

Rp Radius (Rj) 

pp Density (cgs) 

log(g/j) . . Surface gravity 

Teq Equilibrium Temperature (K) . . . 

Safronov Number 

(F) Incident flux (lO' erg s"' cm"^) 

RV Parameters: 

K RV semi-amplitude (m/s) 

Mp sin (■ . . Minimum mass ( Mj ) 

Mp/Mf . Mass ratio 

7 Systemic velocity (m/s) 

Primary Transit Parameters: 

Tc Time of transit (BJDtdb) 

Radius of planet in stellar radii . 
Semi-major axis in stellar radii . 
linear limb-darkening coeff .... 
quadratic limb-darkening coeff . 

1 IncHnation (degrees) 

b Impact Parameter 

Transit depth 

FWHM duration (days) 

T Ingress/egress duration (days) . . 

Total duration (days) 

Pt A priori non-grazing transit prob 

Pj Q A priori transit prob 

Fq Baseline flux 

Secondary Eclipse Parameters: 
Ts Time of eclipse (BJDtdb) 
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first useful link in the chain (before which is considered the 
"burn-in"). 

6.2. Validation 

Generally, our results are in very good agreement with those 
found by T07. All quoted values agree with theirs within Icr, 
and the vast majority agree to better than 0.25(7. The deter- 
mination of the stellar properties is the most fundamental dif- 
ference between our two methods. We use the relations from 
iTorres et"aD (12010) and the spectroscopic priors, to enforce 
consistency with our transit and RV data at each step, whereas 
they fit the transit, then use the fitted d ensity as a constraint to 
their stellar isochrones (lYi et al.ll2001b in a later step to derive 
the stellar properties and iterate dSozzetti etaLll2007l . While 
this is an attempt to do a similar thing, their process results 
in statistically consistent, but not identical values for from 
the stellar parameters (2.36 g cm"-') and the from the fitted 
transit parameters (2.67 g cm""*). However, our densities from 
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Fig. 7. — A demonstration of the PDFs output by EXOFAST. Normally, these are not intended for publication, but for diagnostics. This shows a subset of 
the parameter distributions for the combined RV + transit fit to the T07 data of HAT-P-3b, as generated by EXOFAST_PLOTDIST. The line marks the best-fit 
values corresponding to the minimum x ^ amongst all steps in the Markov Chain. The slight roughness in these PDFs would smooth out with longer chains, but 
the convergence criteria ensure that longer chains are not necessary to determine the median values or their uncertainties reliably. 
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the two sets of parameters are necessarily equivalent (2.78 g 
cm"-'). 

The uncertainty in the transit time from T07 is almost an 
order of magnitude larger than what we find. Given the good 
agreement between everything else, we suspect a typo in T07 
is to blame (e.g., a missing z ero), as our uncert ainty is more in 
line with analytic estimates (ICarter et al.ll2008l) - thus demon- 
strating a major benefit of a code that automatically generates 
the LaTeX source code of the final parameters directly from 
the data. 

Other than that, all of our uncertainties agree within 30%, 
with neither of us finding systematically smaller uncertainties 
across all parameters. Unsurprisingly, the largest differences 
are for the stellar parameters, for which we used fundamen- 
tally different methods. While we in principle, can de rive un- 
certainties that are smaller than the scatter in the Torre s et al] 
((2010) relation due to the additional constraints of the RV 
and transit data, our uncertainties are still dominated by that 
scatter These propagate to slightly larger uncertainties in the 
planetary parameters. 

The radial velocity portio n of this code has be e n used to 
fit the radial velocity data in Flemi ng et al.l (120101) . iLee et al] 
( 1201 1 ), and Wisniewski et al. (20121), though minor modifica- 
tions were required to fit different zero points for each instru- 
ment, as described in the text of those papers. 

We used this code to fit the KELT- lb data dSiverd et alj 
120121) . adding the ability to fit an arbitrary number of tran- 
sits (with TTVs), RV zero points, and the Rossiter Mclaugh- 
lin effect. We also modified o ur code to fi t the KELT-2Ab and 
KELT-3b ( Beattv et al.,,2()H iPepper et al..,2012) with an ar- 
bitrary number of transits, deblend the stars, and constrain the 
stellar radius throug h the H ipparcos distance prior 

As described in ^3.4.21 we ran 100 fits of simulated RV 
curves for each of 15 different intrinsic eccentricities (1500 
total fits in our preferred parameterization). With the excep- 
tion of the Lucy-Sweeney bias for eccentricity discussed in 
detail there, all measured values for the other 5 parameters 
were statistically consistent with the input values, finding only 
312 of the 7500 parameters outside of 2cr (341 expected), and 
10 outside of 3cr (20 expected). Further, all of these simula- 
tions were robustly fit without any intervention or assumption 
of the input values. 

7. ONLINE TOOLS 

7.1. READEXO 

All of the following routines make use of READEXO, an 
IDL program to re ad the current exoplanets.org database 
dWright et al.ll201 fl) into an IDL structure. It is available for 
download and use offline as well. Each header entry, de- 
scribed on their websit^IZl becomes a unique tag name for the 
structure. The code will automatically adapt to the addition 
of rows and columns into the exoplanets.org database, and a 
flag can be set to automatically update the local copy. This 
code is useful for more general manipulation and comparison 
of all exoplanetary data than is allowed by their web inter- 
face, and for integration into software suites. In our code, we 
use it to seed the fits for known planets to make the fits more 
robust, and for retrieving priors on log g, Teff and [Fe /H] to 
get the host properties, rather than requiring the user to sup- 
ply them. However, we caution strongly that the selection 
effects inherent to this sample of planets are very poorly un- 

http://exoplanets.org/help/common/data 



derstood and poorly quantified - care should be taken not to 
over-interpret results that these tools make triv ial. If you use 
this code, please also cite IWright et al.l (1201 l l) and acknowl- 
edge their efforts in making and maintaining this incredibly 
useful database. 

7.2. Ephemerides 

There are now so many transiting planets that, on any given 
night from any given location on Earth , we are very likely to 
be able to observe at least one of them (Eastm an et al.ll20^10al) . 
It is useful, then, to quickly determine which those are in or- 
der to plan observations or fill gaps in observing schedules. 
We present an online calculator that determines which planets 
are transiting or eclipsing from a particular observatory on a 
particular datePl This tool uses the exoplanets.org database, 
which is synced daily. For the predicted secondary eclipse 
times of non-circular orbits and especially the transit times 
of RV planets, the precision is likely to be much worse than 
what we could derive from the original data because these 
times were derived from the values of P, e, lu^. given in exo- 
planets.org and thus do not include the often large covariances 
between these parameters. These could be greatly improved 
if, in the future, published result s included Tp and Ts directly 
from the fits, like HAT does (e.g. lBakos et al.l2012l) . and these 
were included in the exoplanets.org database. 

7.3. RV and Transit fitting 

We provide an intuitive online interface to the basic fea- 
tures of EXOFASlQ which can fit either or both of the RV 
and transit, including an arbitrary number of systematics. Pri- 
ors on Teff and [Fe/H] are required, but can be automatically 
pulled from the exoplanets.org database by selecting the ap- 
propriate planet from a pull down menu. This pull down menu 
will update daily as new planets are added to the exoplan- 
ets.org database. When fitted, the transit (normalized to 1 and 
with systematics removed) and the RV will be plotted and 
both the fitted and derived parameters will be output to the 
screen. 

Unfortunately, since these are run on the server side, it is 
not practical to support the full DE-MC code, so only the 
best-fit values are reported. Therefore, this online fitter is not 
intended for research-quality fits. 

7.4. Quadratic limb darkening 

Our last onh ne code linearly interpolates the 

IClaret & BloemenI ( 1201 ll) quadratic limb darkening ta- 
bles for given values of \ogg, T^ff, and [Fe/H] in any of the 
following bands: Johnson/Cousins U, B, V, R, I, J, H, and K\ 
Sloan m', g' , r' , /', and z'; Spitzer 3.6 pm, 4.5 pm, 5.8 um, and 
8.0 fim; Kepler, CoRoT; and Stroemgren u, v, b, and aFI. This 
can also draw the stellar parameters from the exoplanets.org 
database, if available. 

We would like to thank Howard Relies, Rachel Ross, Karen 
Collins, Thomas Beatty, and Trey Mack for beta tests, bug 
reports, and excellent suggestions for improvements; Eric 
Ford for CUDA translations of bottlenecked routines, Robert 
Siverd for useful discussions about the eccentricity parameter- 
ization and random number generator; and Wayne Landsman 

http://astroutils.astronomy.ohio-state.edu/exofast/ephem.shtml 
" http://astroutils.astronomy.ohio-state.edu/exofast/exofast.shtml 
http://astroutils.astronomy.ohio-state.edu/exofast/limbdark.shtml 
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Fig. 8. — A demonstration of a small subset of the covariance plots output by EXOFAST. Normally, these are not intended for publication, but for diagnostics. 
This shows some of the covariances for the combined RV + transit fit of HAT-P-3b. The contours are the 68% and 95% probability contours, and the black dot is 
the best-fit value. The number above each plot is the covariance between the parameters. 
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for curating the IDL astronomy li brary, from which we use 
many functions dLandsmanl 1 1 993h . This research has made 
use of the Exoplanet Orbit Database and the Exoplanet Data 
Explorer at exoplanets.org. This research has made use of the 
NASA Exoplanet Archive, which is operated by the Califor- 
nia Institute of Technology, under contract with the National 
Aeronautics and Space Administration under the Exoplanet 
Exploration Program. This work was supported in part by 
NSF CAREER grants AST-1056524 and AST-0645416. 



APPENDIX 

ANALYTIC FIT OF THE LINEAR AND 
QUADRATIC LIMB DARKENING 

Given the quadratic limb darkening law (Eq. \W\ and the 
flux during transit (Eq. |20] |. if we make the following defini- 
tions 



xq = l-X\ 

xi=2/3(X'-e{p-z))-X'', 
co=Fo, 

Ml + 2U2 



(Al) 



C2=Fq 



1-M1/3-M2/6' 

U2 



1-M1/3-M2/6' 
the flux during transit can be written: 

F = cqxo + C1X1+ C2X2 ■ 



(A2) 



Since xo, xi, and X2, are solely functions of the tran- 
sit geometry (p and z), they can be calculated indepen- 
dent of the limb darkening and are now optional outputs of 
EXOFAST_OCCULTQUAD. Using these, we can analytically 
solve for the coefficients cq, ci, and C2 by performing a stan- 
dard linear minimization (see Gould 2003). Then the limb 
darkening coefficients, ui and U2 simply become: 



Ml = 
M2 = 



Cl-2C2 



C0 + C1/3-C2/2 
C2 

C0 + C1/3-C2/2 



(A3) 



This analytic fit can greatly increase the speed of any fit- 
ting routine by reducing the dimensionality of the non-linear 
solver, particularly in cases where data are taken with multi- 
ple bands. However, in the low signal-to-noise regime where 
the data have little power to constrain the limb darkening, 
this analytic fit can give non-physical results, in which case 
it is better to fit a linear limb dark ening law, fix the val- 
ues at the IClaret & BloemerJ (1201 ll) values, fit them non- 
linearly with a prior, or allow them to vary by interpolating 
the Claret & Bloemen (20 11) limb darkening tables with each 
new \ogg, Teff, and [Fe/H] during the Markov chain. In addi- 
tion, such a hybrid fit must be used with care, as discussed in 



These relations trivially simplify in the case of linear limb 
darkening, when M2 = 0: 



Jt:o= l-A" 

xi=2/3{X'-eip-z))-X'' 
co = Fo 

Ml 



(A4) 



Cl 



l-Mi/3 



And then, 



Ml = 



co + ci/3 



(A5) 



Given the same form (xq and jci are the same in the quadratic 
and linear cases), one could even fit both laws to see if the im- 
provement in justifies the addition of the extra parameter. 

EXOFAST_OCCULTQUAD 

The original IDL code to analytically c alculate the flux 
decrement during transit presented in Mandel & Agoll (120021) 
was more than an ord er of magnitude f aster tha n the previ- 
ous ri umerical method (ICharbonneau et al.n2000i: iHenry et al.l 
I2OOOI) . However, the IDL version was a simple line by line 
translation of the Fortran code, which does not take advan- 
tage of the fact that IDL is a vector language. In our code, 
EXOFAST_OCCULTQUAD, we sped it up by a factor of 100- 
500 and fixed many bugs. 

Speed Enhancements 

The majority of the speed enhancement came from a ma- 
jor restructuring of the code to take advantage of IDE's much 
faster vector operations and careful attention to optimal case 
ordering. 

With the aid of the IDL profiler, we determined the largest 
remaining bottleneck was in the calculation of the elliptic in- 
tegral of the third kind, which used translations of the nu- 
merical recipes routines r j, rc, and rk. After testing po- 
tential replacements, including IDE's separately-licensed rou- 
tines IMSL_ELRC, IMSL_ELRJ, and IMSL_RLRD, we re- 
placed the numerical recipes routine s with an IDE translation 
of an ALGOL program published by iBulirscl? (1 965 a b ) . This 
iterative routine is many times faster and more robust than any 
other alternative we tried. 

For our last minor speed enhancement, we combined the 
routines to calculate Hasting's polynomial approximation for 
the elliptic integrals of the first {K{k)) and second {E{k)) kind, 
which now share the computationally expensive task of com- 
puting log(l -k). We also tested other replacements for this 
routine, including Fukushima's piecewise polynomial approx- 
imation, (iFukush ima 2009), but none was reliably faster 

The calculation of the elliptic integral of the third kind is 
still the primary bottleneck, but is now comparable to the cal- 
culation of the elliptic integrals of the first and second kind, 
the arc cosine, logarithm, and IDE's WHERE function, so addi- 
tional substantive speed gains will be difficult without a fun- 
damental re-characterization of the problem. 

These speed enhancements combined make this IDL rou- 
tine ^100 times faster than the previous IDL version. Since 
this calculation is a significant fraction of calculating a transit 
model, this improves the run time of entire program by a sig- 
nificant factor We rewrote the Fortran routine with the latter 
two enhancements, which is now twice as fast as its predeces- 
sor, and 2-5x faster than the current IDL version, depending 
on the compiler. 
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Lastly, we include an IDL wrapper for the Fortran version, 
which uses CALL_EXTERNAL and is a drop-in replacement 
for the strictly IDL version described above. After its first 
use, it is indistinguishable from the native Fortran in terms of 
speed, and is therefore 200-500 times faster than the original 
IDL version, and 2-5 times faster than the current IDL code, 
depending on Fortran the compiler used. 

The downside to this wrapper is that the syntax required 
by IDL is technically not legal Fortran. The makefile in the 
IDL example does not implement f77 on Linux, likely be- 
cause of compilation warnings. Some compilers, like gfor- 
tran and f95 simply refuse to compile it. It can be compiled 
with g77 or f77, and the warnings they issue can be safely 
suppressed. Only ifort will compile it without warnings. This 
CALL_EXTERNAL version has only been tested on a 32 bit 
Linux machine. 

For those that wish to use a high-level, but non- 
proprietary language, we include a Python version of 
EXOFAST_OCCULTQUAD, which uses NumPy and mirrors 
the vector structure of the IDL version. Therefore, we ex- 
pected this to be similar in execution time to the IDL, but it 
was actually 5 times slower This may be a general statement 
about Python, or simply a result of the fact that its authors are 
expert IDL programmers and only novice Python program- 
mers. 



Bug fixes and conceptual enhancements 

In addition to being relatively slow, the numerical recipes 
codes to calculate the elliptic integral of the third kind (rc, 
r j, rk) were poorly -behaved where values of z were within 
10"^ of special cases (first, second, third, and fourth contact). 
Very near special cases (10~'^), the previous codes differed 
from the true value by as much as 10%. With the iBulirschI 
([i965a b) algorithm to calculate the elliptic integral of the 
third kind, the differences between the calculated value and 
true value (calculated with arbitrary precision in Mathemat- 
ica) are less than 10"^. Since any model that suffered from 
this bug would be unfairly disfavored, the practical implica- 
tion of this bug is the potential for a bias in the measured times 
that pushes data points away from these special cases. For a 
typical Hot Jupiter, 10"^ corresponds to ^ 90 ms in the plan- 
etary orbit. Fortunately, this is beyond the current precision 
attained to date. However, diabolical alignments of several 
data points may further skew the inferred value. 

We also fixed typos in cases where p > 1 , as in secondary 
eclipses. While these typos would have caused catastrophic 
failures, the behavior of this bug is sufficiently non-physical 
they would have been immediately found, so were unlikely to 
have done any harm. We also fixed a handful of other minor 
bugs, such as consistent use of double precision values and (in 
Fortran) functions. 

The major conceptual enhancements were mentioned pre- 
viously. Namely, we now allow negative values of p, as dis- 
cussed in 34.51 and we return the coefficients required to an- 
alytically fit the quadratic limb darkening parameters as dis- 
cussed in Appendix IaI 

Thorough testing at every special case, ±10"'^ of every spe- 
cial case, and for a thousand uniformly spaced values of z be- 
tween z = and z = 2{ \+p) for hundreds of values of p ranging 
from 10"'^ to 2 have been checked, and no other errors were 
found. 



RANDOM NUMBER GENERATOR 

IDE's help states that RANDOMU, their built-in random num- 
ber generator, is similar to Numerical Recipes rani, which 
has a periodicity, m, of ^ 10^, and should suffice for a series 
of random numbers of length, n, such that n < m/2Q w 50 
million. 

In our sample fit of HAT-P-3b, we generate 16 random num- 
bers per step: one for each of the 1 3 parameters, two to choose 
the random chains, and one to compare to the likelihood ratio. 
We construct 26 simultaneous chains, each with a maximum 
of 100,000 steps. This means that in total, we may generate 
up to 41.6 million random numbers during the course of the 
fit. While this is just below the recommended number, with 
a few more free parameters (recall that we fixed the slope to 
zero and forced the orbit to be circular) or thinning, we could 
easily exceed the periodicity of the generator 

Initially, we were incredulous that the results could be sub- 
stantively affected by the periodicity of the generator, given 
that the chains wander around in N-dimensional parameter 
space and so are unlikely to take the same step from the 
same place. Therefore, we constructed another random num- 
ber generator which cycled through the first 10007 (a prime 
number) random deviates generated by RANDOMU to artifi- 
cially reduce its periodicity and more readily test this effect. 
While the acceptance rate was ideal, the chains were well- 
mixed according to our criteria, and the resulting probability 
distribution functions looked acceptable by eye (i.e., no ob- 
vious signs of any malfunction), the median parameters dif- 
fered significantly (in some cases by more than 1. 5a) from 
the results using RANDOMU. Indeed, it has been shown nu- 
merous times that the quality of the random number generator 
can affect the scientific conclusions in other apphcations (e.g., 
Kalle & Wansle benll 1 984t iParisi & Rapuandl 1 9851: iFilk et alj 
1985; GrassbergliMly 

While looking for a suitable replacement, we found that the 
third edition of Numerical Recipes (Press et al. 2007) warns 
"be cautious about any source earlier than about 1995, since 
the field progressed enormously in the following decade." 
They also state that one should never use a random number 
generator with a periodicity of less than ^ 2 x 10'^. This cer- 
tainly excludes rani, and even excludes their own popularly- 
used alternative ran2, published in 1992, with a periodicity 
of 10'^ 

Numerical Recipes recommends a specific implementa- 
tion of a random number generator, which was translated 
into IDL by Paolo Grigis and included with our distribution 
as PG_RAN. It has a periodicity of ^ 10^^. Our program, 
EXOFAST_RANDOM, is a wrapper for this routine to allow 
it to return an 8-dimensional array of uniform and Gaussian 
random numbers in order to make it a drop-in replacement 
for RANDOMU. While this version is 120 times slower than 
RANDOMU because random number generators are serial by 
nature, and so not ideally implemented in IDL, its contribu- 
tion to the overall runtime of the program is only about 3%. 
A fast, built-in random number generator would essentially 
eliminate this contribution completely. 

We repeated the fit of HAT-P-3b again, now using this 
generator Because we did not approach the periodicity of 
RANDOMU in this fit, it is comforting that we found no statisti- 
cal difference. Further, we first did this test before implement- 
ing DE-MC. Repeating this test afterward, we found the effect 
is much less significant when using DE-MC, but still notice- 
able, than in a standard MCMC implementation because it is 
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so much more efficient and therefore we take a small frac- 
tion of the maximum number of steps. In addition, the steps 
we do take are dominated by the two random numbers that 
choose the chains, not the uniform random deviate for each 
parameter For generality, and because the overall speed hit 
is relatively small, we always use our own, slower generator. 
By setting a simple flag, the user can override this default be- 
havior to select RANDOMU or any other generator, as long as 
it has the same calling structure and has the ability to return 1, 
2, and 3 dimensional uniform and Gaussian random deviates. 

NEGATIVE ECCENTRICITY 

When investigating eccentricity parameterizations, we ex- 
plored one that, to the best of our knowledge, has not been 
tried before: allowing the eccentricity to be bounded between 
-1 and 1. While formally undefined, we were motivated to 
allow a negative eccentricity by considering its definition, 
e = (rg- rp)/(ra + rp), where is the distance to the focus 
at apoapsis and is the distance to the same focus at peri- 
apsis. Therefore, a negative eccentricity implies that we have 
incorrectly labeled our periapsis and apoapsis. When e < 0, 
we redefine a new eccentricity, e' = -e. Since that flips the 
periapsis and apoapsis, we also need to change the argument 
of periastron accordingly: = oj* + tt. Since changing the 
argument of periastron also shifts the time of periastron, we 
need to make sure we calculate the Tp after changing w*. In 
the end, however, we cannot allow negative eccentricities in 
our final distribution because it will create a discrete degen- 
eracy between e and a;,, and -e and cli^ + tt. In a properly- 
sampled Markov Chain, the median value of e would always 
be 0, and for statistically significant eccentricities, the chain 
is likely to get stuck at either the negative or positive eccen- 
tricity and never sample the other We can avoid this by ex- 
plicitly changing our values of e and w*, following the above 
prescription, as soon as e steps to a negative value. Further, if 
oj* is outside of the normally allowed bounds, -tt < < tt, 
we can simply add or subtract Itt until it is within bounds. 
This scheme does not affect our uniform prior in either e or 
oj*. By eliminating regions of parameter space that are out 
of bounds and therefore always rejected, the autocorrelations 
are smaller and the chains converge faster Unfortunately, be- 
cause we must rescale the eccentricity, it does not get rid of 
the Lucy-Sweeney bias as we had originally hoped. 

The resultant prior distributions and a posteriori distribu- 
tions were indistinguishable from the -yicoso;*, -yisino;* pa- 
rameterization. When using a standard Markov Chain imple- 
mentation and for small eccentricities, we found this parame- 
terization to be about 20% faster than any of those discussed 
in EH With the DE-MC code, it was about 20% faster than 
stepping directly in e, where < e < 1, but still nearly half 
as fast as parametrizing it in terms of ecosw* and esinw^or 
Y^coscj, and ^/esino;,. So while we generally recommend a 
DE-MC stepping in -y/ecosoj* and y^sino;*, we still present 
this alternative parameterization in case others find it useful. 

One useful application of the same idea would be when 
incorporating the RM effect, where the projected rotational 
velocity, vsin/ is analogous to eccentricity and the projected 
spin-orbit alignment, A is analagous to the argument of peri- 
astron. Therefore, we can allow a negative vsin/, but when it 
is negative, take its opposite and add tt to A. 



PERFORMANCE 

We ran a typical case of a fit of a simulated data set with 
both transit and RV data with an eccentric orbit, which was 
well-mixed in about 5 minutes. During this time, we ran IDE's 
profiler to analyze the performance to see which of our rou- 
tines may be a bottleneck. While the particular mix will nec- 
essarily depend on the number of data points and the details of 
the model, it is instructive to look at this case study in depth. 

The solution to Kepler's equation is the largest remaining 
bottleneck, consuming 28% of the total computation time. 
Since this iterative solution is necessarily serial, re-writing it 
in Fortran or C and calHng it via CALL_EXTERNAL may be 
a quick way of improving the run time. However, for low- 
eccentricity orbits, the number of iterations is small, and for 
circular orbits, this calculation is trivial. 

At 20%, the next largest contribution is from the program 
that generates the parameter distribution and covariance plots, 
the vast majority going toward the later Because we gener- 
ate a plot for every parameter covariant with every other pa- 
rameter, including all derived parameters, this represents over 
1000 different plots. For each one, it bins every point in ev- 
ery chain into a 2D grid and finds the probability contours. In 
general, this is a useful diagnostic, but not necessary, and can 
be skipped by setting the appropriate flag. 

The next largest contributor is the conversion to the target 
reference frame (12%). It is also an iterative function, and is 
calculated twice at each step in the MCMC chain - one for 
RV and one for photometry. It is important to know that this 
routine cafls EXOFAST_KEPLEREQ, and its contiibution is 
not included in that figure (so as not to double count it). The 
total run time of this calculation is 32%. This is a hefty price 
to pay to include an effect that is nearly negligible, and this 
could be skipped altogether (as has been done to date) without 
any significant difference. Analyzing the run time excluding 
this routine does not qualitatively change the top contributors. 

At 6.3%, the interpolation of the quadratic limb darken- 
ing parameters takes the next biggest slice. The tables are 
large, and simply loading them into memory takes a long time, 
though we make use of global variables to make this quicker 
than it normally would be. 

The last major contributor, at 6.1%, is the calculation of 
the transit light curve (using the all-IDL version). Our ef- 
forts to optimize this calculation are described in detail in Ap- 
pendix |B] Like the conversion to the target reference frame, 
this figure counts the time only spent inside the routine. The 
total contribution of this routine is 12%. Had we left the rou- 
tine alone, the total run time of the MCMC fit would have 
been completely dominated by this calculation, taking an hour 
by itself. We also note that we have not optimized the code 
to calculate the transit flux using a non-linear limb darkened 
stellar model, which was many times slower than the original 
code. Therefore, while in principle, it would be easy to mod- 
ify our code to include non-linear limb darkening, it would 
increase the total run time of the fit by more than an order of 
magnitude. 

Using our comparison between EXOFAST_OCCULTQUAD 
and its Fortran counterpart as a guide, we can guess that 
an all-Fortran version of EXOFAST would probably be 
around 5 times faster Further, considering the execution 
time and relative contribution of the Fortran version of 
EXOFAST_OCCULTQUAD, an improvement of greater than 
50 times is unlikely, even if all other contributions became 
negligible. Recently, however, NVIDIA Graphics Processing 
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Units (GPUs) have been used with CUDA, the Compute Uni- 
fied Device Architecture on highly parallelizable code for a 
significant (factors of 10-100) gains over their CPU counter- 
parts (e.g. Ford 2009). While the links in a DE-MC chain are 
serial, each model calculation within a link and each parallel 
chain could be computed simultaneously to take advantage of 
the GPU architecture if we had enough data points to justify 
the GPU overheads. Indeed, preliminary tests have shown 
a factor of 50 improvement for the calculation of the model 
light curve for Kepler-sized data sets with a relatively inex- 
pensive graphics card (GeForce GTX 480, $200 in 2012) and 
show the potential for even greater improvements. However, 
other bottlenecks currently make the overall boost a small 



fraction of that, making it tough to justify additional hard- 
ware. With some effort, these other bottlenecks may also be 
reduced, making GPUs a promising avenue for large data sets. 

Possibly the easiest way of substantively improving the run 
time is to decrease the total number of steps required for con- 
vergen ce, and therefore require f ewer m odel calculations. Re- 
cently, iForeman-Mackev et al.l (120121) suggested an alterna- 
tive way of stepping, which is worth investigating, though its 
advantage over DE-MC is unclear To stress how much of 
a difference this may make, because of the very strong cor- 
relation between a/R^, and \ogg, a standard implementation 
of the MCMC algorithm converged nearly 1000 times slower 
than its DE-MC counterpart - the difference between 3 min- 
utes and 2 days. 
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